To blog |

Calculating the size of Java Collections

August 1, 2013 by Nikita Salnikov-Tarnovski Filed under: Memory Leaks

This post is a simple case study demonstrating how the standard Java Collections are consuming memory. If you bear with me, I will guide you through the art of predicting and measuring the exact size of an ArrayList, HashMap or any other data structure found in the Collections library.

Our previous articles on the subject covered different aspects on how to calculate the amount of heap required by your data structures:

In this article we are looking for answers to a simple question. Namely – how much memory do my list and map consume after I have run the following code? And how can you put this knowledge to practice, estimating the collection size in your own code?

List list = new ArrayList();
Map<Integer, Object> map = new HashMap<Integer, Object>();
for (int i = 0; i < 1000; i++) {
   list.add(new Integer(i));
   map.put(new Integer(i), new Object());
}

As always, our first recommendation is to measure, instead of guessing. For measurements we used Aleksey Shipilev’s utility. Running our code with the utility attached gave us the following answers on my 64bit Mac OS X:

  • The size of the list is 20,976 bytes
  • The size of the map is 68,168  bytes

So if you are the kind of a developer who is only interested in quick answers then you might not bother reading further and just toss in the aforementioned library to your codebase and start measuring. For the ones who are interested in how those numbers were calculated we need to dig further.

Let’s try to verify that those calculations are correct for the list. In order to do this we need to understand how the ArrayList is constructed. As you might already know, an ArrayList is built just as the name is hinting – it is nothing more than a list interface using an array in the implementation to store the content.

So if we would directly apply the knowledge about how to calculate the retained size of such an array of Integers, we could come up with straightforward math like the following for our array (remember, we are on a 64bit platform):

  • 1,000 references to an object in an array, 8 bytes each = 8,000 bytes
  • 1,000 int primitives wrapped to an Integer. 4 bytes per primitive wrapped in an object with 16 bytes consumed by headers in our 64bit platform and 4 bytes lost to alignment. So 24,000 bytes for the int primitives

So shouldn’t we end up with 32,000 bytes, not 20,976? As it appears, no.

First of all, even though we are running on 64bit machine, our object pointers are most likely compressed. At least when you are running on a JDK 7 or on a post-JDK6u23 version. If you have not manually tweaked the -XX:-UseCompressedOops parameter or are not using heaps larger than 32GB then the defaults on modern platforms are using compressed ordinary object pointers. This means that our original guesses were wrong:

  • 1,000 references to an object in an array, 4 bytes each = 4,000 bytes
  • 1,000 int primitives wrapped to an Integer. 4 bytes per primitive wrapped in an object with 8 bytes consumed by headers in our 64bit platform with CompressedOops and 4 bytes lost to alignment. So 16,000 bytes for the int primitives

Now, with 4,000+16,000=20,000 we are already in the right neighborhood with our calculations. But something is still not right. We still do not understand from where the 976 bytes appeared. Might not seem much, but if you are anything like me then you might still wonder where this extra 5% came from. Or just eager to prove that the author of the library has been wrong. Or a combination of the two.

Did you know that 20% of Java applications have memory leaks? Don’t kill your application – instead find and fix leaks with Plumbr in minutes.

In order to understand this let me introduce the difference between the capacity and the size of an ArrayList. Capacity is the length of the array in the ArrayList. Size on the other hand is the number of elements in this array. More often than not the capacity is larger than the size. For example when you initialize a new ArrayList via its default new ArrayList() constructor, you have created a list with the capacity to accommodate 10 elements. The size of the newly initialized list sits comfortably at zero though.

When you now start to add elements to your ArrayList, the capacity of the underlying array is dynamically increased. The formula for this is somewhat complex so take a look at the source code if you are interested. But if you follow the “measure, don’t guess” principle like me you might wish to find out what the capacity of your ArrayList is during a runtime.

With the help of reflection you can achieve a result like this:

ArrayList list = new ArrayList();
for (int i = 0; i < 1000; i++) {
   list.add(new Integer(i));
}
Field f = ArrayList.class.getDeclaredField("elementData");
f.setAccessible(true);
int capacity = (Object[]) f.get(list)).length
System.out.format(“List size: %d, Capacity: %d", list.size(),capacity);

We can see that even though our list size is 1,000, the capacity of our list is larger, namely 1,234. Those 234 extra references, 4 bytes each in the array, translate to 936 bytes. So we have understood that our data structures should consume 20,936 bytes in memory. Adding now the object header of the ArrayList, instance variables of the named list and paddings to the byte limits we have indeed verified that the original 20,976 bytes is correctly calculated.

Similar math can be applied to the HashMap size calculations. But this would extend the length of the post to the length where I am afraid I am going to lose our last readers who are still with us at this point. We do not wish this to happen, instead it would be recommended that you subscribe to our twitter or RSS feed to be notified on the subsequent posts on other interesting topics.

Tweet about this on TwitterShare on Reddit

ADD COMMENT

Comments

Niiiice!

dru

Cool article, thanks for posting it.

Could you provide some more details on going from 20,936 bytes in memory to 20,976 bytes? Or point to some resources explaining this?

(“Adding now the object header of the ArrayList, instance variables of the named list and paddings to the byte limits we have indeed verified that the original 20,976 bytes is correctly calculated.)

cause

Object header for the array is 12 bytes, length field 4 bytes and the ArrayList consumes 24 bytes in total. So here we have found the missing 40 bytes.

Ivo Mägi