In our last week’s upgrades we added something you most likely have not even noticed. Namely – the possibility to improve Plumbr algorithms. If this sounds like a mystery, bear with me and I will demonstrate the value of the change by letting you to take a peek at Plumbr internals.
As you might recall, Plumbr constantly monitors all object creation and destruction events and looks for anomalies. The anomaly detection magic is based on a dataset containing more than a million memory snapshots that we use to train our algorithms on.
In this post we will focus on the “magic” part. In a simplified form, we can say that Plumbr contains an algorithm that observes the trends of dozens of different variables and decides whether particular symptoms indicate presence of a memory leak or not.
Now, let us see how such an algorithm can be constructed. First, let’s agree on the variables that we want to monitor to determine whether instances of a particular class leak. Based on our previous experience with leak detection, we might want to start by monitoring all the classes X using the following metrics:
Now, if we are clever enough to gather this data from the JVM internals, we can start by plotting the harvested data to see what we could conclude from it. In the following table we see a small test case containing just five classes (#1 – #5) out of which one is leaking:
X | A | B | C | Is a leak? |
---|---|---|---|---|
#1 |
3 |
9 |
1 |
NO |
#2 |
4 |
3 |
2 |
NO |
#3 |
5 |
10 |
6 |
NO |
#4 |
6 |
7 |
2 |
YES |
#5 |
7 |
29 |
4 |
NO |
Having this representation in front of us, we can create an algorithmic representation about how certain combination of the values or certain trends among those variables are correlated with X being a cause for a leak.
This algorithm for example could take the following form:
F(X) = 3*A^2 - 2*B -24*C
In this sample the if F(X) being larger than 0, could indicate that the instances of X are suspects for a memory leak. To understand it better, I have applied the formula on the same sample dataset:
X | A | B | C | Result |
---|---|---|---|---|
#1 |
3 |
9 |
1 |
-15 |
#2 |
4 |
3 |
2 |
-6 |
#3 |
5 |
10 |
6 |
-89 |
#4 |
6 |
7 |
2 |
46 |
#5 |
7 |
29 |
4 |
-7 |
From the above, we see that indeed the algorithm will tell us that the instances of #4 are leaking. Now we could apply our knowledge about the past and apply this algorithm on a wider base – after all the algorithm has worked perfectly on our small sample testcase.
Not so fast though – the first version of our algorithm is not actually too good when applied in the real world. You will immediately discover that some of the reported leaks are not actually leaks and that the actual leaks would remain unnoticed.
We would need way more samples from hundreds and thousands of different applications to come up with an actually useful algorithm. But what defines a good algorithm?
For our users, the quality of the algorithm is visible in two ways. Let us say we have a set of applications at hand containing 100 memory leaks in total. Now you would want to discover as many actual leaks as possible, but at the same time you do not want to be bothered by alerts about something which actually is not a leak.
We express this in two quality metrics:
So, let us enter Plumbr labs where we use machine learning to optimize the algorithm bearing in mind the two quality metrics explained above. This is being done using another algorithm
G(a[],b[],c[], is_a_leak[]) ->F(X)
The new algorithm now takes its input in a form of a sample dataset along with the facts whether a given set of input parameters represented a memory leak or not. The output of this algorithm is a function which is using the input parameters (a,b,c) to calculate whether the particular combination of input parameters does represent a leak or not.
It should be obvious that Plumbr will be getting better and better as we feed more data from the past cases into the machine learning process.
Until the last release, the opinions about whether a particular data represents a leak or not were given by our internal experts. This has worked so far – after all we have seen hundreds and hundreds of leaking applications and have gotten quite good at deciding whether certain symptoms indicate presence of a leak. But our smart readers should already see two major flaws in this approach:
And here we can now finally link this back to where our blog post started – we will now be handing that task of giving expert opinion to the best possible experts in the world – to you. When you get a Plumbr leak report, you will now be offered an opportunity to tell us whether the case we reported really represents a leak or not.
By giving us feedback you actually contribute to making Plumbr smarter – we already see that the crowdsourced expert opinions are increasing the size of the dataset we use to train our algos. Already during the next weeks we should be able to present an algorithm, built with the help of you. If you made it this far with the post then I can only guess I managed to write something you enjoyed. So subscribe to our Twitter feed to be alerted on the next posts on performance optimization and troubleshooting topics.
Can't figure out what causes your OutOfMemoryError
? Read more
Can't figure out what causes your
OutOfMemoryError
? Read more