Image credit Youtube.com
In this blog Eugene Yokota talks about how he has been looking into how others like Jason Zaugg and Johannes Rudolph profile JVM applications, and this post is his findings.
Hi everyone. One of the Tooling team focus lately has been improvement of the contribution process to sbt. Another thing we’ve been thinking is the performance of sbt. Combining these themes together, I’ve been looking into how guys like Jason Zaugg and Johannes Rudolph profile JVM applications, and this post is my findings.
The techniques described here should be applicable to both Java and Scala, and mostly indepent of build tools you use.
Flame graphs (async-profiler)
There are several ways to profile JVM apps, but the new hotness in profiling is Flame graphs invented by Brendan Gregg, Senior Performance Architect at Netflix. You first collect stack trace samples, and then it is processed into an intereactive svg graph. For an introduction to Flame graphs, see:
- Using FlameGraphs To Illuminate The JVM by Nitsan Wakart
- USENIX ATC ’17: Visualizing Performance with Flame Graphs
- Download the installer from async-profiler 1.2.
- Make symbolic link to build/ and profiler.sh to $HOME/bin, assuming you have PATH to $HOME/bin:
Next, close all Java appliations and anything that may affect the profiling, like Slack, and run your app in a terminal. In my case, I am trying to profile sbt’s initial loading:
In another terminal, run:
This tells you the process ID of the app. In this case, it’s 92746. While it’s running, run
This should show a bunch of stacktraces that are useful. To visualize this as a flamegraph, run:
This should produce /tmp/flamegraph.svg at the end.
Image credit lightbend.com
See flamegraph.svg to try the output yourself.
Flame graphs (perf-map-agent)
Even though async-profiler is easier to get started, the fun part of Flame graph is mixing the JVM stack trace with the native code’s stack trace, allowing you see what your program is actually spending its CPU on. It turns out that Lightbend’s Johannes Rudolph wrote a tool for this called perf-map-agent. This uses dtrace on macOS and perf on Linux. This is particularly useful if you are trying to find if the bottle neck is on the native code.
You first have to compile perf-map-agent. For macOS, here to how to export JAVA_HOME before running cmake .:
In a fresh termimal, run sbt with -XX:+PreserveFramePointer flag:
In the terminal that you will run the perf-map:
This would produce better flamegraph in theory, but the output looks too messy for sbt exit case.
Image credit lightbend.com
This might work better if the operations are already JITed, or if the operation is more specific. To get a better Flame graph, one thing we can do now is to repeat the same operation multiple times.
This enhances the signal, and some plateaus become more pronouced that we can zoom in and find out the hot paths.
Image credit lightbend.com
Flamescope
Netflix recently released a new visualization tool called Flamescope that can filter the Flame graph to a specific range of time.
Image credit lightbend.com
This was developed by Martin Spier and Brendan Gregg to study perturbations and other time-based issues. This makes sense because the normal Flame graphs aggregates all stack trace samples, so if there’s a short-lived glitch, it could be burried in other traces.
JMH (sbt-jmh)
Due to JIT warmup etc, benchmarking is difficult. JMH runs the same tests multiple times to remove these effects and comes closer to measuring the performance of your code.
For sbt users, sbt-jmh that Lightbend’s Konrad Malawski wrote makes JMH testing easier. It apparently adds an integration with async-profiler too.
VisualVM
I’d also mention traditional JVM profiling tool. Since VisualVM is opensource, I’ll mention that one.
- First open VisualVM.
- Start sbt from a terminal.
- You should see xsbt.boot.Boot under Local.
- Open it, and select either sampler or profiler, and hit CPU button at the point when you want to start.
Summary
Flame graph avisualizes stack trace samples, which helps in identifying the hot paths in your application. It also helps to confirm if the changes you made has actually improved the performance or not.
Article originally found on lightbend.com