Nodejs application CPU profile analysis with Flame Graphs
In my previous post, I shared about my debugging process using various Linux tools and debugger. During the process, I came across the analysis technique using flame graphs and thought it will be interesting to see what information I can get out of it.
What are flame graphs?
Flame graphs, as the name suggests, are graphs that look like flames because of the shape and color (usually red-yellowish hues). It was invented by Brendan Gregg for the purpose of analyzing performance issue and understand CPU usage quickly.
There are various flame graphs generation tool (e.g
node-stackvis) available. The one I am using is the one built by Brendan Gregg http://github.com/brendangregg/FlameGraph.
As with all tools, it’s important that we understand how to use it. Despite looking rather similar to most time-series graphs, the x/y axes do not represent value/time.
- Each box represents a function in the stack (a “stack frame”).
- The y-axis shows stack depth (number of frames on the stack). The top box shows the function that was on-CPU. Everything beneath that is ancestry. The function beneath a function is its parent, just like the stack traces shown earlier. (Some flame graph implementations prefer to invert the order and use an “icicle layout”, so flames look upside down.)
- The x-axis spans the sample population. It does not show the passing of time from left to right, as most graphs do. The left to right ordering has no meaning (it’s sorted alphabetically to maximize frame merging).
- The width of the box shows the total time it was on-CPU or part of an ancestry that was on-CPU (based on sample count). Functions with wide boxes may consume more CPU per execution than those with narrow boxes, or, they may simply be called more often. The call count is not shown (or known via sampling).
- The sample count can exceed elapsed time if multiple threads were running and sampled concurrently.
- The colors aren’t significant, and are usually picked at random to be warm colors (other meaningful palettes are supported).
Capturing perf data
Before we can perform flame graph analysis, we have to first capture the CPU profile data.
To do that, I will be using Linux
$ uname -a Linux vm1 4.15.0-154-generic #161-Ubuntu SMP Fri Jul 30 13:04:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux $ sudo apt install -y linux-tools-4.15.0-154-generic
$ sudo perf record -e cycles:u -g -- npm run watch ... .. . ^C[ perf record: Woken up 33 times to write data ] failed to mmap file Terminated [ perf record: Captured and wrote 8.227 MB perf.data ]
perf will generate a
perf.data file and also a
perf-*.map file in the
/tmp directory. The purpose of
.map file is to provide
perf with application-specifc symbol map, with which it will be able to tell which instruction pointers belong to which application functions.
Processing the data into flame graph
Currently we only have a raw
perf.data, which isn’t really useful.
We have to use
perf script to generate a trace record file, do some scrubbing of the data to remove less-than-useful frames and finally generate the flame graph.
$ sudo perf script > perf.out $ sed -i '/\[unknown\]/d' perf.out $ sed -i -e "/( __libc_start| LazyCompile | v8::internal::| Builtin:| Stub:| LoadIC:|\[unknown\]| LoadPolymorphicIC:)/d" -e 's/ LazyCompile:[*~]\?/ /' perf.out $ git clone --depth 1 http://github.com/brendangregg/FlameGraph $ cd FlameGraph $ ./stackcollapse-perf.pl < ../perf.out | ./flamegraph.pl --colors js > ../node-flamegraph.svg
Accessing the flame graph
You can open the
svg directly in a browser. For my case, I used a simple Python HTTP server to serve the SVG files as I am using a VM.
$ python3 -m http.server
Analyzing the flame graph
Let’s see what we can find out about Nodejs internals from this.
Many foreign terms (
Builtins_*) and some a little more familiar ones (
After some researching, I found this awesome article on Nodejs internals. Despite having used Nodejs for a few years, it’s always been sort of a blackbox to me. It’s really interesting to know what goes on underneath.
A little summary of the article:
- Nodejs is a runtime i.e an environment provided for a program to execute successfully. In the case of Nodejs, it is through a combination of V8 and various C++ libraries to enable Nodejs applications to execute
- Core Nodejs dependencies are V8 and libuv.
libuvis a library written in C++ that enables low-level (networking, file system, concurrency) access to operating system
Highly recommended to read it yourself and check out the examples in it.
Now the appearance of
uv_* frames make sense!
uv_* are actually the functions from
libuv library, which as explained, is required to provide low-level OS access. It is also used to implement the Nodejs event loop!
__libc_start_mainis an initialization routine and performs necessary initialization of execution environment.
Some further digging down the rabbit hole later, I found this Linux x86 Program Start Up post, which explains the steps taken to load and run an application and the role that
As the current flame graph still contains Nodejs and V8 internal functions, I filtered it again to make application related functions more visible.
$ sed -i -E '/( __libc_start| LazyCompile | v8::internal::| Builtin:| Stub:| LoadIC:|\[unknown\]| LoadPolymorphicIC:)/d' perf.out
Almost immediately, we can identify the functions called by
@elastic/elasticsearch and would have helped greatly in identify the tranporting of huge amount of logs to ElasticCloud as the root cause behind the high CPU usage.