Anomaly Detection

We are looking into how we can tell if a point is behaving irregularly. Being able to look at a time range and select specific points to look at will help reduce the amount of data facilities needs to look at in order to troubleshoot in a given building. We decided that away to do this in the early stages would be to use kmeans clustering, which is an easy-to implement algorithm that we can use through sklearn. Some research that Kiya looked into can be seen here.

We chose this approach because:

  • We don’t know enough about our data to develop a model-based detection system
  • Clustering is unsupervised, meaning we don’t already need to know what points are anomalous
  • By clustering, we don’t expect every point to behave exactly the same way but we do expect some patterns between similar points.

Kmeans

Kmeans is a clustering analysis algorithm that, given data points and a number, n, of desired clusters will categorize m-dimensional data points into n categories.

Insert an example here with a couple pictures

https://docs.google.com/presentation/d/19NAHDsxQbjwuffGsPYSBg3DXJbbPDzCakr4zOrQdhdE/edit#slide=id.g31e789b1e2_0_1

Insert how to run it here

analysis/anomaly_detection/anomaly_detection.py

Follow formatting in plot_cluster to get the data that is shown in above presentation