How much memory will I need? This is a question you might have asked yourself (or others) when building a solution, creating a data structure or choosing an algorithm. Will this graph of mine fit in my 3G heap if it contains 1,000,000 edges and I use a HashMap to store it? Can I use the standard Collections API while building my custom caching solution or is the overhead posed by them too much?
Apparently, the answer to the simple question is a bit more complex. In this post we’ll take a first peek at it and see how deep the rabbit hole actually is.
The answer to the question in the headline comes in several parts. At first we need to understand whether you are interested in shallow or retained heap sizes.
The shallow heap is easy – it consists of only the heap occupied by the object itself. There are some nuances to how to calculate it, but for the scope of this article we leave it as is. Stay tuned for future posts on the same topic.
The retained heap is in many ways more interesting. Only rarely are you interested in the shallow heap, in most cases your actual question can be translated to “If I remove this object from the memory, how much memory can now be freed by the garbage collector”.
Now, as we all remember, all Java garbage collection (GC) algorithms follow this logic:
- There are some objects which are considered “important” by the GC. These are called GC roots and are (almost) never discarded. They are, for example, currently executing method’s local variables and input parameters, application threads, references from native code and similar “global” objects.
- Any objects referenced from those GC roots are assumed to be in use and hence not discarded by the GC. One object can reference another in different ways in Java, in the most common case an object A is stored in a field of an object B. In such case we say “B references A”.
- The process is repeated until all objects that can be transitively reached from GC roots are visited and marked as “in use”.
- Everything else is unused and can be thrown away.
Now to illustrate how to calculate the retained heap, let’s follow the aforementioned algorithm with the following example objects:
To simplify the sample, let’s estimate that all the objects O1-O4 have the shallow heap of 1024B = 1kB. Lets start calculating the retained sizes of those objects.
- O4 has no references to other objects, so its retained size is equal to its shallow size of 1kB.
- O3 has a reference to O4. Garbage collecting O3 would thus mean O4 would also be eligible for garbage collection and so we can say that O3 retained heap is 2kB.
- O2 has a reference to O3. But it is now important to note that removing the pointer from O2 to O3 does not make O3 eligible for GC, as O1 still has got a pointer to it. So O2 retained heap is only 1kB.
- O1 on the other hand is the object keeping all the references in this small graph, so if we would remove O1, everything on this graph would be garbage collected. So O1 retained heap is 4kB.
Which implications does this have in practice? In fact, understanding the differences between shallow and retained heap sizes makes it possible to work with tools such as memory profilers and heap dump analyzers – for example digging into Eclipse MAT might prove to be impossible if you don’t know how to distinguish these two types of heap size measurements.