Thursday, May 25, 2017

Applying machine learning to study and do capacity planning


ML algorithms and libraries have evolved and the scope of applying the techniques has also grown a lot. I am trying here to apply one of such techniques to study the performance metrics and then estimate the required capacity at a future load.

Lets take a simple set like pageviews and heap usage
Example:
pv,  heapusage(mb)
105,637
110,638
115,640
120,642
125,644
130,646
135,648
140,650
145,652


Now, since this is not a classification type dataset, the regression models can be used. There are a few regression models from one of the coolest ML libraries - scikit (sklearn).

let's take a linearRegression model and try to fit it on the set. 

The logic to implement the model is:  
- declare the data set into arrays (have not changed the parameters like squaring them etc.,)
- create sets for training and validation as 20% is used for validation
- choose the model as linearRegression
- fit the model on the train set to make it learn the trends
- get the score of learning to verify how well it has fit on the dataset
- get the coefs and intecepts used to fit the model
- predict the validation set and compare it with actual values in validation set
- if it is all good, then try to predict the load for future increased load

Once, this is done, I have build wrappers and a simple UI.

UI Page 1: To simply upload the feature file i.e. the metrics file


UI Page 2: This will show the result. The correlation between the metrics, validation sets and how close the predicted values and finally the predicted output i.e. required Heap Memory for the given page views.



Although this is a simple demonstration, more features can be added..