Measuring memory overhead in Java

September 24, 2012 by Nikita Salnikov-Tarnovski

I have spent a lot of time recently measuring Plumbr’s overhead and figuring out some ways of reducing it. When I started the measurements, I was frustrated to the extent that I had to let my feelings out in a form of a previous blog post. But now I am back in my constructive mode and ready to share more insights on the subject.

How do you measure memory overhead of a Java agent? The answer, as seen below, seems trivial at first:

“You measure memory consumption of the application with that agent and without it, then compare”.

As it turns out, even the “measure” part is not that obvious, not to speak of “comparing”. Let me explain. We are using our own performance monitor to judge how well Plumbr fairs. It runs as a separate thread in JVM and periodically measures the amount of memory consumed by application. Here is an example output for one of our test application:

Memory Graph

We are using java.lang.Runtime to get the information from the JVM. In the picture we have lined up different metrics:

  • JVM heap’s capacity is measured by Runtime.totalMemory()
  • Used memory is calculated using Runtime.freeMemory()
  • Native memory usage is reported from OS’s /proc/self/status.

As we can see, there are 3 different memory related metrics. And seemingly the most important of them – used memory amount – looks more like a cardiologist’s nightmare. So it seems not the right source of information to base your overhead calculations upon.

We needed a better idea. And as usual it was truly simple – we turned our eyes into monitoring the size of live data instead. Live data denotes objects that are still alive after full garbage collection cycle. This effectively serves as the lowest bound of the heap required for the application. You can’t get away with less. How much heap is occupied by application’s live data can be easily read from GC log: find a line corresponding to Full GC and read it from there. Here is an example:

[Full GC [PSYoungGen: 19377K->0K(217024K)] [PSOldGen: 161109K->152083K(261568K)] 180486K->152083K(478592K) [PSPermGen: 125886K->125886K(258688K)], 1.6528730 secs] [Times: user=1.65 sys=0.01, real=1.66 secs]

If you are unfamiliar with the GC logs then lets see what are the useful bits in it:

  • PSYoungGen: 19377K->0K(217024K) indicates that the objects in young generation were fully collected. Its used size was reduced from 19377KB to 0KB. And the total available size for the young generation is 217024KB
  • PSOldGen and PSPermGen sections contain the same information, but for different areas in memory – correspondingly showing the changes in old and permanent generations
  • The total time the full GC ran was 1.66 seconds.

Now, the live data size is equal to the size of old generation after the full GC. In our case it equals to 152083KB of heap memory. During application’s stable phase this number doesn’t fluctuates very much. So usually you could take either lower convex boundary or average the old generation size over several full GC runs.

What it meant for us was that – we now had a foundation to build the test suites for Plumbr to do the measuring. And we are ready to publish the numbers soon. After I have gotten rid of some obvious overheads we have already found during the measurements.

Enjoyed the post? We have a lot more under our belt. Subscribe to either our RSS feed or Twitter stream and enjoy.

Can't figure out what causes your OutOfMemoryError? Read more

ADD COMMENT

Can't figure out what causes your OutOfMemoryError? Read more

Latest
Recommended
You cannot predict the way you die
When debugging a situation where systems are failing due to the lack of resources, you can no longer count on anything. Seemingly unrelated changes can trigger completely different messages and control flows within the JVM. Read more
Tuning GC - it does not have to be that hard
Solving GC pauses is a complex task. If you do not believe our words, check out the recent LinkedIn experience in garbage collection optimization. It is a complex and tedious task, so we are glad to report we have a whole lot simpler solution in mind Read more
Building a nirvana
We have invested a lot into our continuous integration / delivery infrastructure. As of now we can say that the Jenkins-orchestrated gang consisting of Ansible, Vagrant, Gradle, LiveRebel and TestNG is something an engineer can call a nirvana. Read more
Creative way to handle OutOfMemoryError
Wish to spend a day troubleshooting? Or make enemies among sysops? Registering pkill java to OutOfMemoryError events is one darn good way to achieve those goals. Read more