Automating a multi-platform build

May 8, 2013 by Nikita Salnikov-Tarnovski

This is the second post in the series describing our development infrastructure. We started with describing the external goals related to multi-platform support. In the first post we also reasoned why we need to test on different platforms separately. We concluded the post with the fact that instead of “support as many configurations as possible”, a lot more feasible goal would be “support as many users as possible”. From this customer-facing goal we now extracted internally used development goals:

  • Automate builds. Plumbr release has to be built without any manual intervention.
  • Automate tests. Plumbr release has to be verified automatically using different testing techniques ranging from unit to acceptance tests.
  • Automate infrastructure. Build infrastructure has to be able to launch and destroy the server instances used to build and test Plumbr automatically.
  • Provide transparency. Whether it is a functionality implemented or bug fixed – we need to know in which versions this change is present. If an exception is thrown, we need to be able to map the obfuscated stacktrace back to the actual source code using the correct version of the obfuscation map.

Filling those goals builds the foundation for a solid and verified release. So lets dig in and see what have we done in order to achieve all this.

To give you some background – each Plumbr release consists of close to 20 different deliverables. Those deliverables consist of 10 different native binaries for different platforms alongside the java artifacts, such as the javaagent itself,  internally used dashboards or the demo application. As you can imagine, building such a deliverable is neither a simple nor quick task.  Due to the amount of dependencies and the complexity of the we are required to automate a lot. And we have applied several build automation techniques to reach to the desired one click build nirvana.

Lets start by introducing the platforms used. If you recall, we had to support five different OS families with different processor architectures underneath:

  • We have a completely normal Amazon EC2 running a recent Ubuntu distribution. All the Linux builds are happily running in this single box. So far so good. The rest of the gang is not so mainstream and boring though.
  • We have a Mac Mini sitting in the office. This Mac brings a lot of bang for the 599 bucks we spent on it, running:
  • A museum-grade SPARC running Solaris builds for the SPARC architecture. As the noise during the build resembles a jet-fighter we had to move this one to the separate room in the office to bear the sound.

This gang of two physical and five virtual machines is orchestrated by a Jenkins node responsible for starting and stopping the builds. Lets look into what a build consists of.

 
Mac Windows Linux Solaris Jenkins

First step of the build is acquiring the source code from the version control. We are using a Bitbucket repository for source code management. The model in the repository is truly simple – all the development and stabilization for new releases is done in the default branch. This has been possible due to the small team size we have had so far. Looking from the build perspective, getting updates is as easy as monitoring a single branch for updates and pulling the updates upon discovery.

Next step is to build the native agent. We need this part of the platform to hook into the low-level JVM internals. As this is impossible to achieve in Java we had to write this part of the code in C. Native part of the Plumbr is built with the help of a good old makefile, containing several conditional branches supplying suitable flags for different compilers and linkers. For example – we use gcc on Linux and Mac. On Microsoft platforms we use ‘cl’ which takes 29, I kid you not, command line parameters to compile a dll. Then we have bitness issue, meaning that we have to build 32 and 64 bit versions of our native code. So in each of those five virtual machines is responsible in building two native libraries.

Now the build is ready for the Java modules to be assembled. Java section of Plumbr consists of several modules, such as the agent itself, graphical user interface and the demo application shipped along with the distribution. All those will be built from the same repository with the help of the multi-project Gradle script. We have used Ant in our early days, but the complexity of evolving this XML mess forced us to switch to the richer alternative. Pity, that we have not done this a year earlier.

After all this, we have finally compiled everything we need to run Plumbr. In this phase both the unit- and integration tests are run. Those tests are written using TestNG and are used to verify the correctness of the build.

Now it is time to obfuscate the generated code to prevent reverse engineering our superalogrithms. This is done by using Proguard. Obfuscation is in itself a simple process becoming complex only when you care about preservation of the original stacktraces. Throw in the need to support multiple versions all with their own obfuscation maps and you have another dimension in your build process to worry about. And you start feeling sympathetic for the guy nurturing the build.

All the build results are finally packaged into an ZIP file published to the Artifactory repository. The final ZIP consists of the Java agent we will later ship to our clients along with the two platform-specific binaries for each OS (a 32-bit and a 64-bit version).

Now it is time for the Jenkins to start orchestrating acceptance tests. Acceptance tests in short are a set of applications being either deployed to an application server or ran in standalone mode. Plumbr is then attached to the application and users are being emulated to verify that we indeed are able to find all known leaks in those applications.

With the acceptance tests present, we need to run those tests on different environments. Namely, more than 200 of them. The process involves launching a specific virtual machine, starting a pre-configured application server in it, deploying a test to the server and launching simulation. All this thrown into a single sentence sounds simple. In reality we have sunk endless hours into both extracting the testcases to the test applications and configuring the machines to include different JDK’s and application servers. And we are still long way from the goal of supporting our 200 required configurations – in the current form we cover just 50 most popular combinations.

When all the tests have succeeded, the distribution is made public in the form of the latest nightly build in Plumbr download page.

Creating an official release goes through the exact same process. The only extra steps added are tagging the release in Bitbucket and publishing the built artifacts into the Artifactory production repository.

As simple as that. Has taken only about a man-year to create the aforementioned process. Moral of the story? Java applications are definitely a huge leap ahead in terms of cross-platform compliancy. But instead of Write Once, Run Anywhere concept you are better off with the Write Once, Test Everywhere approach. Or you will end up shipping code making your end user’s life a misery.

The post might seem familiar to those of of you who were participating in JavaOne Moscow this year. Indeed, you had a chance to hear me on stage with the same presentation. But the rest of the ~8,310,700 who did not have a chance to be present I hope I was able to introduce some interesting concepts. If so, subscribe to our twitter feed to be notified about future posts.

Can't figure out what causes your OutOfMemoryError? Read more

ADD COMMENT

Can't figure out what causes your OutOfMemoryError? Read more

Latest
Recommended
You cannot predict the way you die
When debugging a situation where systems are failing due to the lack of resources, you can no longer count on anything. Seemingly unrelated changes can trigger completely different messages and control flows within the JVM. Read more
Tuning GC - it does not have to be that hard
Solving GC pauses is a complex task. If you do not believe our words, check out the recent LinkedIn experience in garbage collection optimization. It is a complex and tedious task, so we are glad to report we have a whole lot simpler solution in mind Read more
Building a nirvana
We have invested a lot into our continuous integration / delivery infrastructure. As of now we can say that the Jenkins-orchestrated gang consisting of Ansible, Vagrant, Gradle, LiveRebel and TestNG is something an engineer can call a nirvana. Read more
Creative way to handle OutOfMemoryError
Wish to spend a day troubleshooting? Or make enemies among sysops? Registering pkill java to OutOfMemoryError events is one darn good way to achieve those goals. Read more