To blog Previous post | Next post
Tackling memory leaks with machine learning
In our last week’s upgrades we added something you most likely have not even noticed. Namely – the possibility to improve Plumbr algorithms. If this sounds like a mystery, bear with me and I will demonstrate the value of the change by letting you take a peek at Plumbr internals.
As you might recall, Plumbr constantly monitors all object creation and destruction events and looks for anomalies. The anomaly detection magic is based on a dataset containing more than a million memory snapshots that we use to train our algorithms on.
In this post we will focus on the “magic” part. In simplified form, we can say that Plumbr contains an algorithm that observes the trends of dozens of different variables and decides whether particular symptoms indicate the presence of a memory leak or not.
Now, let us see how such an algorithm can be constructed. First, let’s agree on the variables that we want to monitor to determine whether instances of a particular class leak. Based on our previous experience with leak detection, we might want to start by monitoring all the classes X using the following metrics:
- A – number of Full GC runs the instances of X have survived
- B – the number of classes from which X is being referenced
- C – % of instances of X relative to the all currently live instances
Now, if we are clever enough to gather this data from the JVM internals, we can start by plotting the harvested data to see what we could conclude from it. In the following table we see a small test case containing just five classes (#1 – #5) out of which one is leaking:
X | A | B | C | Is a leak? |
---|---|---|---|---|
#1 |
3 |
9 |
1 |
NO |
#2 |
4 |
3 |
2 |
NO |
#3 |
5 |
10 |
6 |
NO |
#4 |
6 |
7 |
2 |
YES |
#5 |
7 |
29 |
4 |
NO |
Having this representation in front of us, we can create an algorithmic representation about how certain combination of the values or certain trends among those variables are correlated with X being a cause for a leak.
This algorithm for example could take the following form:
F(X) = 3*A^2 - 2*B -24*C
In this sample, F(X) being larger than 0, it could indicate that the instances of X are suspects for a memory leak. To understand it better, I have applied the formula on the same sample dataset:
X | A | B | C | Result |
---|---|---|---|---|
#1 |
3 |
9 |
1 |
-15 |
#2 |
4 |
3 |
2 |
-6 |
#3 |
5 |
10 |
6 |
-89 |
#4 |
6 |
7 |
2 |
46 |
#5 |
7 |
29 |
4 |
-7 |
From the above, we see that indeed the algorithm will tell us that the instances of #4 are leaking. Now we could apply our knowledge about the past and apply this algorithm on a wider base – after all the algorithm has worked perfectly on our small sample testcase.
Not so fast though – the first version of our algorithm is not actually very good when applied in the real world. You will immediately discover that some of the reported leaks are not actually leaks and that the actual leaks would remain unnoticed.
We would need way more samples from hundreds and thousands of different applications to come up with an actually useful algorithm. But what defines a good algorithm?
For our users, the quality of the algorithm is visible in two ways. Let us say we have a set of applications at hand containing 100 memory leaks in total. Now you would want to discover as many actual leaks as possible, but at the same time you do not want to be bothered by alerts about something which is actually not a leak.
We express this in two quality metrics:
- True positives – for example Plumbr algos are able to discover 88 out of every 100 memory leaks.
- False positives – out of every 100 leaks Plumbr reports, 13 turn out to be false alarms.
So, let us enter Plumbr labs where we use machine learning to optimize the algorithm bearing in mind the two quality metrics explained above. This is being done using another algorithm
G(a[],b[],c[], is_a_leak[]) ->F(X)
The new algorithm now takes its input in a form of a sample dataset along with the facts whether a given set of input parameters represented a memory leak or not. The output of this algorithm is a function which is using the input parameters (a,b,c) to calculate whether the particular combination of input parameters does represent a leak or not.
It should be obvious that Plumbr will be getting better and better as we feed more data from the past cases into the machine learning process.
Until the last release, the opinions about whether a particular data represents a leak or not were given by our internal experts. This has worked so far – after all we have seen hundreds and hundreds of leaking applications and have gotten quite good at deciding whether certain symptoms indicate presence of a leak. But our smart readers should already see two major flaws in this approach:
- Quality of the opinions. Without knowing the ins and outs of the application at hand and being equipped with just statistical data our experts are still likely to be incorrect every once in a while.
- Quantity of the opinions. The approach just doesn’t scale. Analyzing the incoming data is a task which is slowly but surely becoming impossible – on a daily basis we are buried under hundreds and hundreds of new data points that we just cannot cope with.
And here we can now finally link this back to where our blog post started – we will now be handing that task of giving expert opinion to the best possible experts in the world – to you. When you get a Plumbr leak report, you will now be offered an opportunity to tell us whether the case we reported really represents a leak or not.
By giving us feedback you actually contribute to making Plumbr smarter – we already see that the crowdsourced expert opinions are increasing the size of the dataset we use to train our algos. Already during the next weeks we should be able to present an algorithm, built with your help. If you made it this far with the post then I can only guess I managed to write something you enjoyed. So subscribe to our Twitter feed to be alerted on the next posts on performance optimization and troubleshooting topics.