Calculating the size of Java Collections

August 1, 2013 by Nikita Salnikov-Tarnovski

This post is a simple case study demonstrating how the standard Java Collections are consuming memory. If you bear with me, I will guide you through the art of predicting and measuring the exact size of an ArrayList, HashMap or any other data structure found in the Collections library.

Our previous articles on the subject covered different aspects on how to calculate the amount of heap required by your data structures:

In this article we are looking for answers to a simple question. Namely – how much memory do my list and map consume after I have ran the following code. And how can you put this knowledge to practice estimating the collection size in your own code.

List list = new ArrayList();
Map<Integer, Object> map = new HashMap<Integer, Object>();
for (int i = 0; i < 1000; i++) {
   list.add(new Integer(i));
   map.put(new Integer(i), new Object());
}

As always, our first recommendation is to measure, not guess. For measurements we used Aleksey Shipilev’s utility. Running our code with the utility attached gave us the following answers on my 64bit Mac OS X:

  • The size of the list is 20,976 bytes
  • The size of the map is 68,168  bytes

So if you are the kind of a developer who is only interested in quick answers then you might not bother reading further and just toss in the aforementioned library to your codebase and start measuring. For the ones who are interested in how those numbers were calculated we need to dig further.

Lets try to verify those calculations are correct for the list. In order to do this we need to understand how the ArrayList is constructed. As you might already know, an ArrayList is built just as the name is hinting – it is nothing more than a list interface using an array in the implementation to store the content.

So if we would directly apply the knowledge about how to calculate the retained size of such an array of Integers, we could come up with a straightforward math like the following

for our array (remember, we are on a 64bit platform):

  • 1,000 references to an object in an array, 8 bytes each = 8,000 bytes
  • 1,000 int primitives wrapped to an Integer. 4 bytes per primitive wrapped in an object with 16 bytes consumed by headers in our 64bit platform and 4 bytes lost to alignment. So 24,000 bytes to the int primitives

So shouldn’t we end up with 32,000 bytes, not 20,976? As it appears, no.

First of all, even though we are running on 64bit machine, our object pointers are most likely compressed. At least when you are running on a JDK 7 or on a post JDK6u23 version. If you have not manually tweaked the -XX:-UseCompressedOops parameter nor using heaps larger than 32GB then the defaults on modern platforms are using compressed ordinary object pointers. This means that our original guesses were wrong:

  • 1,000 references to an object in an array, 4 bytes each = 4,000 bytes
  • 1,000 int primitives wrapped to an Integer. 4 bytes per primitive wrapped in an object with 8 bytes consumed by headers in our 64bit platform with CompressedOops and 4 bytes lost to alignment. So 16,000 bytes to the int primitives

Now, with the 4,000+16,000=20,000 we are already in the right neighbourhood with our calculations. But something is still not right. We still do not understand where did the 976 bytes appear. Might not seem much, but if you are anything like me than you might still wonder where did this extra 5% came from. Or just eager to prove that the author of the library has been wrong. Or the combination of the two.

In order to understand this let me introduce the difference between the capacity and the size of an ArrayList. Capacity is the length of the array in the ArrayList. Size on the other hand is the number of elements in this array. More often than not the capacity is larger than the size. For example when you initialize a new ArrayList via its default new ArrayList() constructor, you have created a list with capacity to accommodate 10 elements. The size of the newly initialized list sits comfortably at zero though.

When you now start to add elements to your ArrayList, the capacity of underlying array is dynamically increased. The formula for this is somewhat complex, so if you are interested take a look at the source code. But if you follow the “measure, don’t guess” principle like me you might wish to find out what is the capacity of your ArrayList during a runtime.

With the help of reflection you can achieve the result like this:

ArrayList list = new ArrayList();
for (int i = 0; i < 1000; i++) {
   list.add(new Integer(i));
}
Field f = ArrayList.class.getDeclaredField("elementData");
f.setAccessible(true);
int capacity = (Object[]) f.get(list)).length
System.out.format(“List size: %d, Capacity: %d", list.size(),capacity);

We can see that even though our list size is 1,000, the capacity of our list is larger, namely 1,234. Those 234 extra references, 4 bytes each in the array translate to 936 bytes. So we have understood that our data structures should consume 20,936 bytes in memory. Adding now the object header of the ArrayList, instance variables of the named list and paddings to the byte limits we have indeed verified that the original 20,976 bytes is correctly calculated.

Similar math can be applied to the HashMap size calculations. But this would extend the length of the post to the extent where I am afraid I am going to lose our last readers who are still with us at this point. We do not wish this to happen, instead it would be recommended that you subscribe to our twitter or RSS feed to be notified on the subsequent posts on other interesting topics.

Can't figure out what causes your OutOfMemoryError? Read more

ADD COMMENT

COMMENTS

Niiiice!

dru

Cool article, thanks for posting it.

Could you provide some more details on going from 20,936 bytes in memory to 20,976 bytes? Or point to some resources explaining this?

(“Adding now the object header of the ArrayList, instance variables of the named list and paddings to the byte limits we have indeed verified that the original 20,976 bytes is correctly calculated.)

cause

Object header for the array is 12 bytes, length field 4 bytes and the ArrayList consumes 24 bytes in total. So here we have found the missing 40 bytes.

Ivo Mägi

Can't figure out what causes your OutOfMemoryError? Read more

Latest
Recommended
You cannot predict the way you die
When debugging a situation where systems are failing due to the lack of resources, you can no longer count on anything. Seemingly unrelated changes can trigger completely different messages and control flows within the JVM. Read more
Tuning GC - it does not have to be that hard
Solving GC pauses is a complex task. If you do not believe our words, check out the recent LinkedIn experience in garbage collection optimization. It is a complex and tedious task, so we are glad to report we have a whole lot simpler solution in mind Read more
Building a nirvana
We have invested a lot into our continuous integration / delivery infrastructure. As of now we can say that the Jenkins-orchestrated gang consisting of Ansible, Vagrant, Gradle, LiveRebel and TestNG is something an engineer can call a nirvana. Read more
Creative way to handle OutOfMemoryError
Wish to spend a day troubleshooting? Or make enemies among sysops? Registering pkill java to OutOfMemoryError events is one darn good way to achieve those goals. Read more