What’s your overhead?

September 13, 2012 by Nikita Salnikov-Tarnovski

“What’s your overhead?”. We often hear this question when we talk about Plumbr. Wikipedia describes overhead as “any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal”. When you dig further in the Internet, you see mostly CPU overhead and memory overhead being mentioned.

That is why we can probably translate the question posed in the title into “how much more memory and/or how many more CPU cycles are utilized by my application when run with Plumbr attached?”. After having spent four weeks on trying to measure Plumbr overhead as precisely as possible, I have got a bunch of numbers under my belt. Unfortunately though, it seems that none of them makes much sense in the real life. Let’s elaborate on this a bit more, to see the reasons behind my conclusion.

I have always had doubts when answering the overhead-related questions. I think they are vague, misleading and represent the desire to grab the lowest hanging fruit first. Let me remind you, that the job of any programmer is to solve business problems. Yes, on rare occasions we write software out of pure joy of creation. And sometimes the business is weird – for example, sending people to jail or shooting them to outer space. But most of the time we write software in order to support a particular business operation. For example, to let people buy more things, to let them communicate with their loved ones, or to let them express themselves.

When you evaluate some tool and think about its overhead, shouldn’t you think about what impact will this tool have on the amount of business you can conduct? Like – “How will it affect the number of credit card transactions your application is able to process every minute?”.

The most important metric would then be the number of operations your application can carry out per time unit. And by “operations” I mean business operations. If your customer spends five minutes filling out the purchase form, then in most cases it doesn’t matter if the application is able to process the submitted order in 500ms or in 600ms. Unless you are Amazon or eBay.

In order to measure overhead or, in a more general case, your application’s performance, the only adequate technique is to test your application by mimicking its users. This is much harder than running “vmstat” and measuring CPU utilization, sure. But only this realistic load can give you some meaningful results.

Don’t take me wrong. CPU utilization and memory consumption (or more importantly, GC workload in case of JVM) are very important numbers. When you are trying to figure out how to achieve your performance goals. But not as the goals themselves. Does it matter whether your CPU utilization is 60% or 90%? The most cost-effective CPU utilization is 100%! Then every single CPU cycle is working hard, helping you to solve your end users’ problems or needs.

In conclusion: evaluate the performance of your application in terms of business operations. Measure it, using realistic load at your application’s front-end. Troubleshoot it, using all those tricks and tips about JVM innards and OS quirks you have learned at Kirk Pepperdine’s workshop. But don’t evaluate application performance by the number of JIT compilations per second.

Enjoyed the post? We have a lot more under way. Subscribe to either our RSS feed or Twitter stream and enjoy.

Can't figure out what causes your OutOfMemoryError? Read more

ADD COMMENT

Can't figure out what causes your OutOfMemoryError? Read more

Latest
Recommended
You cannot predict the way you die
When debugging a situation where systems are failing due to the lack of resources, you can no longer count on anything. Seemingly unrelated changes can trigger completely different messages and control flows within the JVM. Read more
Tuning GC - it does not have to be that hard
Solving GC pauses is a complex task. If you do not believe our words, check out the recent LinkedIn experience in garbage collection optimization. It is a complex and tedious task, so we are glad to report we have a whole lot simpler solution in mind Read more
Building a nirvana
We have invested a lot into our continuous integration / delivery infrastructure. As of now we can say that the Jenkins-orchestrated gang consisting of Ansible, Vagrant, Gradle, LiveRebel and TestNG is something an engineer can call a nirvana. Read more
Creative way to handle OutOfMemoryError
Wish to spend a day troubleshooting? Or make enemies among sysops? Registering pkill java to OutOfMemoryError events is one darn good way to achieve those goals. Read more