Would you dare to change HashMap implementation?

November 6, 2013 by Nikita Salnikov-Tarnovski

There are bold engineers working for the Oracle nowadays. I came to this conclusion when trying to nail down a Heisenbug yesterday. Not too surprisingly, the bug seemed to disappear when I was trying to find the solution. Several hours later, the “Heisen”-part of the bug was removed, when the problem was traced down to minor differences between the JDK7 updates.

But back to the bravery claim. In order to understand the case I am describing I extracted it into a really simple test snippet for you to try out:

class OOM {
	public static void main(String[] args) {
		java.util.Map m = new java.util.HashMap(10_000_000);
	}
}

Now when I launch the class on my 64bit Mac OS X with the JDK7u40 or later:

my:tmp user$ /path-to/jdk1.7.0_40/bin/java -Xmx96m OOM
my:tmp user$

You see the command prompt returning and the JVM successfully completing its job. Now, launch the same class with JDKu25 or earlier:

my:tmp user$ /path-to/jdk1.7.0_25/bin/java -Xmx96m OOM
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.HashMap.(HashMap.java:283)
    at java.util.HashMap.(HashMap.java:297)
    at OOM.main(OOM.java:3)

And you see a different result. Initializing a HashMap with 100M entries fails to allocate enough resources in our ~100m heap and the JVM exits with an OutOfMemoryError being thrown.

Source of the HashMap is clearly the #1 suspect in this case. And indeed, when you compare the source code of the JDK 7u25 to the next release (named u40, kudos for naming!), you see a significant difference. The Hashmap(initialCapacity, loadFactor) constructor now ignores your will to construct a HashMap with the initial size being equal to initialCapacity. Instead, you see the underlying array being allocated lazily only when the first put() method is called on the map.

A seemingly very reasonable change – JVM is lazy by nature in different aspects, so why not postpone the allocation of large data structures until the need for such allocation becomes imminent. So in that sense a good call.

In the sense that a particular application was performing tricks via reflection and directly accessing the internal structures of the Map implementations – maybe not. But again, one should not bypass the API and start being clever, so maybe the particular developer is now a bit more convinced that each newly found concept is not applicable everywhere.

Would you have made the change yourself if you were the API developer? I am not convinced I would have had the guts, knowing that there has to be around bazillion apps out there depending on all kind of weird aspects of the implementation. But I do vote for reasonable changes within the JDK and can count this one definitely among the good ones.

If you enjoyed the content, consider subscribing to our RSS or Twitter feeds. We keep opening up the Java optimization techniques and tricks, one minor article at the time.

Can't figure out what causes your OutOfMemoryError? Read more

ADD COMMENT

COMMENTS

One of the core principles of good Object-oriented design is that encapsulation allows you to change your implementation, without changing the functionality specified by the API. And this is exactly what the Oracle engineers were taking advantage of here.

If someone wrote code that depended on the internals of HashMap (and not just the public API), then they should have been aware that they were doing so at their own risk.

Any API designer (like Oracle/Sun) should be able to change the implementation details at any time, so long as the public API contract doesn’t change (which it didn’t in this case). That’s the main point of encapsulation.

Green Giant

If you allocate HashMap with 10 millions items and actually use them, you will get OOM any way. And if you allocate map for 10 millions items and use 10 of them, then your code is bad and not portable. I don’t see any problem here.

Larry Ellison

10M was chosen as an easy way to demonstrate the change. The main topic of the post is that JDK engineers has quite dramatically changed inner implementation and, to some degree, runtime characteristics of one the most widely used class. And although I rather agree with this change, I am somewhat troubled that it went so unnoticed.

iNikem

The reason this change was made was explained partially in the JavaOne session Java Memory Hogs, the presenter explained that we as Java Devs are overusing empty hashmaps, so the reason they wait to initialize it is to reduce wasted heap space allocated to empty hashmaps and other collections. See the slides here

Hardy

Thanks for the tip! Their numbers are interesting, but begs for critical evaluation.

iNikem

I dunno. I’d rather not have lazy initiliazation such that on the initial insert the first insert is slow. I’d much rather create a map and in the constructor it can set up the basic data structures. This way performance behaves as expected.

Ed

I think you have a valid concern here. I have no idea what the real-world implications of this change will be.

But in any case I am glad to see Oracle engineers hard at work. They are quite impressive recently.

iNikem

Can't figure out what causes your OutOfMemoryError? Read more

Latest
Recommended
You cannot predict the way you die
When debugging a situation where systems are failing due to the lack of resources, you can no longer count on anything. Seemingly unrelated changes can trigger completely different messages and control flows within the JVM. Read more
Tuning GC - it does not have to be that hard
Solving GC pauses is a complex task. If you do not believe our words, check out the recent LinkedIn experience in garbage collection optimization. It is a complex and tedious task, so we are glad to report we have a whole lot simpler solution in mind Read more
Building a nirvana
We have invested a lot into our continuous integration / delivery infrastructure. As of now we can say that the Jenkins-orchestrated gang consisting of Ansible, Vagrant, Gradle, LiveRebel and TestNG is something an engineer can call a nirvana. Read more
Creative way to handle OutOfMemoryError
Wish to spend a day troubleshooting? Or make enemies among sysops? Registering pkill java to OutOfMemoryError events is one darn good way to achieve those goals. Read more