How much memory do I need (part 1) – What is retained heap?

August 14, 2012 by Nikita Salnikov-Tarnovski

How much memory will I need? This is a question you might have asked yourself (or others) when building a solution, creating a data structure or choosing an algorithm. Will this graph of mine fit in my 3G heap if it contains 1,000,000 edges and I use a HashMap to store it? Can I use the standard Collections API while building my custom caching solution or is the overhead posed by them too much?

Apparently, the answer to the simple question is a bit more complex. In this post we’ll take a first peek at it and see how deep the rabbit hole actually is.

The answer to the question in the headline comes in several parts. At first we need to understand whether you are interested in shallow or retained heap sizes.

The shallow heap is easy – it consists of only the heap occupied by the object itself. There are some nuances to how to calculate it, but for the scope of this article we leave it as is. Stay tuned for future posts on the same topic.

The retained heap is in many ways more interesting. Only rarely are you interested in the shallow heap, in most cases your actual question can be translated to “If I remove this object from the memory, how much memory can now be freed by the garbage collector”.

Now, as we all remember, all Java garbage collection (GC) algorithms follow this logic:

  1. There are some objects which are considered “important” by the GC. These are called GC roots and are (almost) never discarded. They are, for example, currently executing method’s local variables and input parameters, application threads, references from native code and similar “global” objects.
  2. Any objects referenced from those GC roots are assumed to be in use and hence not discarded by the GC. One object can reference another in different ways in Java, in the most common case an object A is stored in a field of an object B. In such case we say  “B references A”.
  3. The process is repeated until all objects that can be transitively reached from GC roots are visited and marked as “in use”.
  4. Everything else is unused and can be thrown away.

Now to illustrate how to calculate the retained heap, let’s follow the aforementioned algorithm with the following example objects:

Calculating Retained Heap Size 

To simplify the sample, let’s estimate that all the objects O1-O4 have the shallow heap of 1024B = 1kB. Lets start calculating the retained sizes of those objects.

  • O4 has no references to other objects, so its retained size is equal to its shallow size of 1kB.
  • O3 has a reference to O4. Garbage collecting O3 would thus mean O4 would also be eligible for garbage collection and so we can say that O3 retained heap is 2kB.
  • O2 has a reference to O3. But it is now important to note that removing the pointer from O2 to O3 does not make O3 eligible for GC, as O1 still has got a pointer to it. So O2 retained heap is only 1kB.
  • O1 on the other hand is the object keeping all the references in this small graph, so if we would remove O1, everything on this graph would be garbage collected. So O1 retained heap is 4kB.

Which implications does this have in practice? In fact, understanding the differences between shallow and retained heap sizes makes it possible to work with tools such as memory profilers and heap dump analyzers – for example digging into Eclipse MAT might prove to be impossible if you don’t know how to distinguish these two types of heap size measurements.

Full disclosure: this post was inspired by Patrick Dubroy’s talk on Google I/O, which you can watch in full length here.

Can't figure out what causes your OutOfMemoryError? Read more

ADD COMMENT

Can't figure out what causes your OutOfMemoryError? Read more

Latest
Recommended
You cannot predict the way you die
When debugging a situation where systems are failing due to the lack of resources, you can no longer count on anything. Seemingly unrelated changes can trigger completely different messages and control flows within the JVM. Read more
Tuning GC - it does not have to be that hard
Solving GC pauses is a complex task. If you do not believe our words, check out the recent LinkedIn experience in garbage collection optimization. It is a complex and tedious task, so we are glad to report we have a whole lot simpler solution in mind Read more
Building a nirvana
We have invested a lot into our continuous integration / delivery infrastructure. As of now we can say that the Jenkins-orchestrated gang consisting of Ansible, Vagrant, Gradle, LiveRebel and TestNG is something an engineer can call a nirvana. Read more
Creative way to handle OutOfMemoryError
Wish to spend a day troubleshooting? Or make enemies among sysops? Registering pkill java to OutOfMemoryError events is one darn good way to achieve those goals. Read more