Let us continue our series of posts about solving the OutOfMemoryError in our hypothetical production system.
We have described different methods to tackle the problem, and today’s post concentrates on what you can learn from heap dumps. Spoiler alert: with a bit of luck, you can get very close to solving the OOM
In retrospect, these are the methods we have already tried:
- We have added more physical memory to that machine. Nothing changed.
- We have attached memory profiler to it. Our application became ten times slower, and lived ten times longer.
- We have wrestled with command line tools and saw some magic numbers.
What is a “memory dump”?
How can you create the memory dump?
The memory dump can be created in two ways:
- By adding “-XX:+HeapDumpOnOutOfMemoryError” to your java start command, like this:
java -XX:+HeapDumpOnOutOfMemoryError -Xmx512m ...When you start your JVM like this, whenever an OutOfMemoryError (OOM) is thrown by the JVM, a full memory dump will be written to the disk just before crashing/exiting/whatever your JVM does at an OOM.
Although waiting for the JVM to crash before starting to look for the causes of the problem could be a little harsh, I strongly suggest that no JVM run in production without this parameter. Let me stress that again: Go and add that parameter in your production server configuration now! The reason is very straightforward: if your production server would ever suffer from an OutOfMemoryError, you will want to possess that memory dump. It will be the most useful data for postmortem analysis and often the dump alone will be sufficient to find out the cause of the crash.
- Creating the memory dump of the running application. That can be achieved using VisualVM, Eclipse MAT or the jmap command line tool we described in the previous blog post.
But before we get there, let me note one very important thing: you should choose wisely the point in time when you make that dump. Your goal is to solve the OutOfMemory problem that pesters your application. As I explained in another post, in order to find the cause of a memory leak you must let it manifest itself. There is no point in trying to find the memory leak in your application 2 seconds after it is started (unless it crashes with OOM 5 seconds after that). Let your users use the application, let it consume enough memory, let garbage collector try to free that memory. As a rule of thumb I wouldn’t recommend making the memory dump until, after a couple of garbage collection cycles, at least 50-60% of your maximum heap is used up.
I should note here, that the making of a memory dump usually means “freezing” your application memory, in a similar way to a full garbage collection. As a result, your application does not respond to users’ requests during that time. How long does it take? A little bit more than needed to write raw data in the amount corresponding to you application’s heap to your server’s hard disk. In some cases it may take up to a few minutes.
Analysis of the memory dump
Alright, now the memory dump is done, and transferred from production site to the developer’s machine. When you open it with MAT, you get the following picture (after some lengthy process of parsing that multi-gigabyte file):
On the background you can see a visual representation of the heap dump with the fattest objects highlighted. We will get to that shortly. On the foreground the MAT tool proposes some “Getting started” options. For the time being let’s select the first one, “Leak Suspects Report”. After some analysis your MAT will show you this:
As you can see, MAT has found one leak suspect, which occupies 89% of application’s memory, taken by instances of class “org.springframework.samples.petclinic.web.LeakingInterceptor”. If you click on the “Details” link you will see some more info about where the instances reside and why they are so big.
Now let us come back to that pie MAT displays when you open the dump.
When you click on the largest slice, you can select “Paths to GC Roots” or “Merge Shortest Paths to GC Roots” in order to find what is holding that large amount of instances. Or click on “List objects with outgoing references” to see what has been accumulated in it. In my experience the above info is all you need to go to your source code and start thinking about fixing the bug.
However, on a final note I would like to point out that memory dumps are not silver bullets. Working with them has some disadvantages as well.
- Taking them from live application makes it unresponsive for your clients for a period of time. Getting them from a crashed application is free, if we don’t account for the cost of the crashed application
- Analyzing multi-gigabyte memory dumps requires at least as much memory from the developer machine. And 32GB+ servers are much more common nowadays than 32GB developer workstations.
- If the memory dump is taken at a wrong time, it contains a significant amount of noise. You can lose a lot of time searching for the needle in that haystack (and in some cases, might need to acquire an additional haystack).