Hunting down memory leaks: a case study

March 5, 2013 by Nikita Salnikov-Tarnovski

A week ago I was asked to fix a problematic webapp suffering from memory leaks. How hard can it be, I thought – considering that I have both seen and fixed hundreds of leaks over the past two years or so.

But this one proved to be a challenge. 12 hours later I had discovered no less than five leaks in the application and had managed to fix four of them. I figured it would be an experience worth sharing. For the impatient ones – all in all I found leaks from

The application at hand was a simple Java web application with a few datasources connecting to the relational databases, Spring in the middle to glue stuff together and simple JSP pages rendered to the end user. No magic whatsoever. Or so I thought. Boy, was I wrong.

First stop - MySQL drivers. Apparently the most common MySQL drivers launches a thread in the background cleaning up your unused and unclosed connections. So far so good. But the catch is that the context classloader of this newly created thread is your web application classloader. Which means that while this thread is running and you are trying to undeploy your webapp, its classloader is left dangling behind - with all the classes loaded in it.

Apparently it took from July 2012 to February 2013 to fix this after the bug was discovered. You can follow the discussion in MySQL issue tracker. The solution finally implemented was a shutdown() method to the API, which you as a developer should know to invoke before redeploys. Well, I didn’t. And I bet 99% of you out there didn’t, either.

There is a good place for such shutdown hooks in your typical Java web application, namely the ServletContextListener class contextDestroyed() method. This specific method gets called each and every time the servlet context is destroyed, which most often happens during redeploys for example. Chances are that quite a few developers are aware this place exists, but how many are actually realise the need to clean up in this particular hook?

Back to the application, which was still far from being fixed. My second discovery was also related to context classloaders and datasources. When you are using com.jdbc.myslq.Driver it registers itself as a driver in java.sql.DriverManager class. Again,this is done with good intentions. After all, this is what your application uses to figure out how to choose the right driver for each query when connecting to the database URL. But as you might guess, there is a catch:  this DriverManager is loaded in bootstrap classloader,rather than your web application’s classloader, so cannot be unloaded when redeploying your application.

What now makes things really peculiar is that there is no general way to unregister the driver by yourself. The reference to the class you are trying to unregister seems to deliberately hidden from you. In this particular case I was lucky and the connection pool used in the application was able to unregister the driver. In case I remember to ask. Looking back to similar cases in my past, this was the first time I saw such a feature implemented in connection pool. Before that, I once had to enumerate through all the JDBC drivers registered with DriverManager to figure out which ones should I unregister. Not an experience I can recommend to anyone.

This should be it, I thought. Two leaks in the same application is already more than one can tolerate. Wrong. The third issue staring right at me from the leak report was sun.awt.AppContext with its static field mainAppContext. What? I have no idea what this class is supposed to do, but I was pretty sure that the application at hand didn’t use AWT in any way. So I started a debugger to find out who loads this class (and why). Another surprise:it was com.sun.jmx.trace.Trace.out() . Can you think of a good reason why a com.sun.jmx class would call a sun.awt class? I certainly can’t. Nevertheless, that class stack originated from the connection pool, BoneCP. And there’s absolutely zero way to skip that code line that leads to this particular memory leak. Solution? The following magic incantation in my ServletContextListener.contextInitialized():

Thread.currentThread().setContextClassLoader(null);
// Force the AppContext singleton to be created and initialized without holding reference to WebAppClassLoder
sun.awt.AppContext.getAppContext();

But I still wasn’t done: Something was still leaking. In this case I found out that our application was binding this datasource to the InitialContext() JNDI tree, a good, standardized way to bind your objects for future discovery. But again – when using this nice thing you had to clean up after yourself by unbinding this datasource from the JNDI tree in the very same contextDestroy() method.

Well, so far we had pretty logical, albeit rare and somewhat obscure problems, but with some reasoning and google-fu were quickly fixed. My fifth and last problem was nothing like that. I still had that application crashing with OutOfMemoryError: PermGen. Both Plumbr and Eclipse MAT reported to me that the culprit, the one who had taken my classloader hostage, was a thread named com.google.common.base.internal.Finalizer.

“Who the hell is this guy?” – was my last thought before the darkness engulfed me.

A couple of hours and four coffees later I found myself staring at three lines:

emf.close();
emf = null;
ds = null;

It is hard to recollect exactly what happened during the intervening hours. I have remote memories of WeakReferencesReferenceQueuesFinalizersReflection and my first time of seeing a PhantomReference in the wild. Even today I still cannot fully explain why and for what purpose the connection pool used finalizers tied to google’s implementation of reference queue running in a separate thread.

Nor can I explain why closing javax.persistence.EntityManagerFactory (named emf in the code above and held in static reference in one of application’s own classes) was not enough; and so I had to manually null this reference. And similar static reference to the data source used by that factory. I was sure that Java’s GC could cope with circular references all day long, but it seems that this magical ring of classes, static references, object, finalizers and reference queues was too hard even for him. And so, again for first time in my long career, I had to nullify java reference.

I am a humble guy and thus cannot claim that I was the most efficient in finding the cure for all of the above in a mere 12 hours. But I have to admit I have been dealing with memory leaks almost exclusively for the past three years. And I even had my own creation,Plumbr, helping me (in fact, four out of five of those leaks were discovered by Plumbr in 30 minutes or so). But to actually solve those leaks, it took me more than a full working day in addition.

Overall – something is apparently broken in the Java EE and/or classloader world. It cannot be normal that a developer must remember all those hooks and configuration tricks, because it simply isn’t possible. After all, we like to use our heads for something productive. And, as seen from the workarounds bundled with two popular servlet containers (Tomcat and Jetty), the problem is severe. Solving it, however, will require more than simply alleviating some of the symptoms, but curing the underlying design errors.

If you followed the post this far, then you should consider following us on Twitter to be notified on our next posts

This is a cross-post from last week with jaxenter.

Can't figure out what causes your OutOfMemoryError? Read more

ADD COMMENT

COMMENTS

I will look into it. Looks very interesting.

iNikem

Cool, this came just at the right time. We were about to put all our existing own code (subset of what you have) into a corporate library.nnHowever, by quickly glancing over the code I found three issues, two of which deal-breakers for us:n- empty catch blocks are a no-gon- no logging, I don’t consider sys out as logging, no-gon- I would have hoped to see each “fix” in its own class to avoid such a lenghty blown-up listener class, my personal clean code preferencennI hope the Plumbr guys give this lib a thumbs-up.

Marcel Stör

Hi Marcel,nnwell the Code is open source so you can change the behavior as you like. I did it as well for my project. As for your comments there are reasons why there are empty catch blocks and no logging. I will just quote the author on the logging issue.nn> Primary design goal: Zero dependencies. The component should build and run using nothing but the JDK and the > Servlet API. Specifically we should not depend on any logging framework, since they are part of the problem.As for the empty catch blocks, since this is an one fixes all LeakPreventor it tries to fixas many leaks as it can, sometimes you’ll get an exception that is totally acceptable.If mayBeJBoss = (contextClassLoader.getResource(“org/jboss”) != null);nthrows an exception we just assume that we are not on jboss. There is nothing that cannbe done anyways. It would not make sense to log it either since it is not really an error,nso it is just ignored.

Damokles

Thanks for taking the time to respond.nn”The component should build and run using nothing but the JDK”nnThat’s a noble goal and one I don’t oppose. I just hope the author is familiar with java.util.logging which has been part of the JDK since 1.4.

Marcel Stör

As for “I once had to enumerate through all the JDBC drivers…” I suppose something like the following would need a lot of tweaking.nnEnumeration drivers = DriverManager.getDrivers();nwhile (drivers.hasMoreElements()) {n Driver driver = drivers.nextElement();n if (this.getClass().getClassLoader().equals(getClass().getClassLoader())) {n try {n DriverManager.deregisterDriver(driver);n logger.info(“Deregistered ‘{}’ JDBC driver.”, driver);n } catch (SQLException e) {n logger.warn(“Failed to deregister ‘{}’ JDBC driver.”, driver);n }n }n}

Marcel Stör

This is exactly the code I have written once :)

iNikem

The reason as you said is a design error, the Java classloader is not flexible enough to support redeploys and at some point it will exhaust all the available memory. It’s way simpler to restart the application server, even if it’s just a workaround and not a solution.

Sebastiano Pilla

I hoped to demonstrate by this article, that one can solve leaks. And that memory exhaustion is not inevitable.

iNikem

In our experience it’s been necessary to restart tomcat with every deployment or all sorts of strange glitches occur.nIt doesn’t help that Java has a whole slew of weird behaviour by default, like caching successful DNS lookups in perpetuity (ignoring TTLs) and so on.nnIn theory we should be able to just keep deploying without restart, and we’d love to do so, but the time and effort involved in resolving all the niggles is hardly worth it. We keep our apps fairly small so restarting an instance takes less than a minute.

Twirrim

My goodness this sounds awful. Instead of a redeploy, is it possible to instead take down the entire server itself? Perhaps isolate this application so that it runs by itself on the server so that when you need to redeploy, you take down the entire server process to ensure nothing like this can happen.

Jonathan Keam

One can mitigate his pain in this manner. Isolated restart may be indeed better solution that redeploy. But it is our opinion that one should fix the root cause, not the symptoms. Every leak can and has to be solved!

iNikem

Can't figure out what causes your OutOfMemoryError? Read more

Latest
Recommended
You cannot predict the way you die
When debugging a situation where systems are failing due to the lack of resources, you can no longer count on anything. Seemingly unrelated changes can trigger completely different messages and control flows within the JVM. Read more
Tuning GC - it does not have to be that hard
Solving GC pauses is a complex task. If you do not believe our words, check out the recent LinkedIn experience in garbage collection optimization. It is a complex and tedious task, so we are glad to report we have a whole lot simpler solution in mind Read more
Building a nirvana
We have invested a lot into our continuous integration / delivery infrastructure. As of now we can say that the Jenkins-orchestrated gang consisting of Ansible, Vagrant, Gradle, LiveRebel and TestNG is something an engineer can call a nirvana. Read more
Creative way to handle OutOfMemoryError
Wish to spend a day troubleshooting? Or make enemies among sysops? Registering pkill java to OutOfMemoryError events is one darn good way to achieve those goals. Read more