To blog |

Native memory leak example

March 14, 2016 by Gleb Smirnov Filed under: Memory Leaks

We have written quite a lot about memory leaks in Java. The pattern confirming to you the presence of a memory leak is the growth of used heap memory after Major GC events. The major GCs constantly free less and less memory exposing a clear growth trend.

There is however a different type of a memory leak affecting Java deployments out there. This leak would happen in native memory and you would notice no clear trend when monitoring different memory pools within the JVM. The symptoms would include a perfectly healthy chart in regards of heap & permgen consumption as seen below, coupled with the continuous increase of the total memory used by the Java process on the operating system level:

Java memory leak from native code

Example

As I recently stumbled upon a problem where native memory leakage proved to be a problem, I decided to open up the details giving you an example how such leaks can actually happen in the real world. I was able to reduce the example to a simple enough code just loading and transforming classes:

public static void main(String[] args) throws InterruptedException {
    final BottomlessClassLoader loader = new BottomlessClassLoader();

    while (true) {
        loader.loadAnotherClass();
        Thread.sleep(100);
    }
}

So that is all there is – an unterminated loop, just loading classes using the BottomlessClassLoader class loadAnotherClass() method.

Now let us launch this code in two different ways:

  • First launch is just generating classes and keeping the references, essentially just piling up class definitions in memory.
  • Second launch is attaching a javaagent and is a tad bit more complex, generating classes similar to the first launch and transforming the bytecode using the agent’s premain method:
public class BloatedAgent {
    public static void premain(String agentArgs, Instrumentation inst) {
        inst.addTransformer((loader, name, clazz, pd, originalBytes) -> originalBytes, true);
    }
}

The transformation is special in regards that it actually does not apply any transformations, returning the original bytes of the class unchanged.

As the next step, the memory usage from both launches was monitored from the OS. In both launches, memory usage of the Java process was captured at certain intervals, using

$ top -R -l 0 -stats mem,time -pid <pid>

command, resulting in the data exposed via following chart:

java native leak

Understanding the problem

What we see from above is that the second launch is consuming a lot more memory. This is surprising. If you recall, the transformation itself does not actually transform the class, returning the original bytecode. So one might expect the memory consumption for both of the launches would be identical.

First part in understanding the problem starts to make sense when you think about the class definition storage. After all, shouldn’t the class definitions reside in the permgen/metaspace and would monitoring the permgen also be sufficient to detect this particular issue?

Apparently not. Whenever we return non-null value from the transform method, the JVM assumes that the class was modified in some way. Additionally, when we set the canRetransform parameter (in the second param after the lambda) in Agent’s premain method to true, the JVM expects that at some point you will attempt to retransform the class applying a different transformation. As a result, the original non-transformed bytecode is kept by the JVM “just in case”.

This approach, weird at first point, starts to make sense when thinking about classloaders where loading is an expensive operation, say, some network class loader. You would not want to go to the trouble of fetching the very same bytes once again. Therefore, the JVM caches the original bytecode of the class. It does not store it into the metaspace or permgen, but rather into its own native memory. As a result of this, you would not experience any growth in either heap and permgen/metaspace growth but only would notice the problem when monitoring native memory consumption.

The second part of the answer is hidden in the java.lang.instrument.ClassFileTransformer Javadoc, where for the method transform() it is clearly stated that in cases where the transformation is not actually applied, the transform() method should return null. In this case the JVM implementation is aware of the fact that the class was not actually transformed and there is no need to store additional copy of the bytecode in native memory.

So the fix to the issue was as easy as making the transformation to return null instead of the original behavior where the bytecode itself was returned. But was it easy to troubleshoot the issue? No way, this includes three days from my life which I will never get back. I can only hope that sharing this knowledge will end up saving someone from going through the same mess in the future.

ADD COMMENT

Comments

Thank Gleb for the useful article. Wondering if this issue affects significantly across java deployments these days.

Hamid

Thank you Gleb. I am wondering where in native memory the JVM caches the original bytecode of the class. And if the additional copy of the bytecode is stored where would it be.

Vitaly Grinberg