Finding Performance Bottlenecks in Your Rails API
Hello fellow Rubyists! In this tutorial, we're going to learn how to profile a Rails API and find performance bottlenecks with free, open-source tools. This method of profiling is only possible if you're on Ruby 2.0 and above.
How to generate a flame graph
At the moment, there are two main gems used to generate a flame graph in Ruby: BrendanGregg/FlameGraph and SamSaffron/flamegraph. The most popular one is BrendanGregg/FlameGraph, which uses rack-mini-profiler to plug into your application and present its results as an HTML badge on every page. This method is highly convenient for web developers, but we're profiling API routes that return JSON today. Let's keep moving.
The second flamegraph by SamSaffron is more of a general purpose gem that can be used for APIs. It has a very clean interface, consisting of only Flamegraph.generate
, but it produces large HTML files that inline the data as JSON, which is then rendered to screen dynamically using JQuery and D3.js. Unfortunately for me, since I tend to do personal side work on second-hand Thinkpad laptops purchased from eBay, this is a resource-intensive process that swiftly crashes my browser.
We'll have to get creative. Sometimes the old ways are best. Before rack-mini-profiler came out, we used to rely on ruby-prof to glean performance insights. The ruby-prof-flamegraph gem provides a class, called FlameGraphPrinter, that can be used along with ruby-prof for our purposes.
But the FlameGraphPrinter does not actually print flame graphs. It prints an intermediate text-based file format that can be turned into a flame graph with the flamegraph.pl
script that we find in the very first gem we looked at, BrendanGregg/FlameGraph.
Start by including the ruby-prof and ruby-prof-flamegraph gems in your Gemfile. Then, create a middleware:
class Profiler
def initialize(app)
@app = app
end
def profiling_filename(env)
filename_friendly_path = env['PATH_INFO'].tr('/',' ').strip.tr(' ','-')
filename_friendly_path = 'root' if filename_friendly_path == ""
"#{filename_friendly_path}.prof.txt"
end
def call(env)
# Begin profiling
RubyProf.measure_mode = RubyProf::WALL_TIME
RubyProf.start
# Run all app code
res = @app.call(env)
# Stop profiling & save
out = RubyProf.stop
File.open("#{ENV['PROF_OUT']}/#{profiling_filename(env)}", "w+") do |file|
RubyProf::FlameGraphPrinter.new(out).print(file)
end
res
end
end
Then, enable your new middleware in application.rb
:
config.middleware.use "Profiler"
This will save the profiling data for every request in an intermediate text-based file format. You can specify where the intermediate files are saved with the environment variable PROF_OUT
.
Now, access the routes you want to profile. There are many tools available for this. My favorite at this point in time is cURL. Here's how I used cURL to call a route in RubyGems.org, the project I'm using for this tutorial:
curl http://localhost:3000/api/v1/gems/rails.json
Of course, setting up a Rails project for development is a topic that deserves its own tutorial.
Next, you need to generate the final SVG files.
- Clone the FlameGraph repo here: https://github.com/brendangregg/FlameGraph.git
cd
into your FlameGraph repo- Run this:
for f in `ls $PROF_OUT/*.prof.txt`; do cat $f | ./flamegraph.pl --countname=ms --width=1024 > $f.svg; done
Now, open one of the SVG files in your browser and you should see something like this:
How to read a flame graph
RubyProf runs the code and captures the call stack of the program at regular intervals. As the Ruby interpreter travels from method to method to method, the call stack gets deeper and deeper and deeper. Then, as each method finishes, it returns the interpreter back to its caller, and the call stack unwinds. After it's done, RubyProf takes these thousands of call stacks and combines them to produce a single report. This report used to look like a flat HTML table, ordered not chronologically, but instead by statistical frequency. This felt a bit like reading Egyptian hieroglyphics.
But hey, great news: now we have flame graphs! A flame graph presents this report along two axes: time, which flows from left to right, and call stack depth, which increases as you go from bottom to top. The resulting picture is one of scraggly peaks, plateaus and valleys, and rolling hills. This format is very good at revealing the complexity of a piece of code, as well as specific methods that may be bottlenecks. Because Ruby is an interpreted language where abstraction is not free, both of these factors are important to consider when profiling Ruby code.
The sharp, scraggly mountain ranges in a flame graph represent systems of relatively fast methods that refer to and call each other. The code represented here is highly efficient, but also highly complex. You may benefit from refactoring it to have fewer objects, methods, and classes, but since the code is already fast, this can end up being a lot of work for not a lot of gains.
Sometimes the interpreter will sit for a few moments inside one method. These are shown as flat plateaus and valleys. This formation represents a method that blocks the interpreter from continuing. There are a number of reasons why this may take place, and it's not always possible to improve this type of problem, as we'll learn in the next section.
Then there are the rolling hills, which also look like staircases. These represent chains of slow methods. These are generally your most fruitful regions for optimization, because it shows both a high level of complexity and a high level of inefficiency in the code itself.
The last formation to consider is the short burst of choppiness. This represents optimized code.
Before we move into misconceptions, I want to mention the importance of intuition in profiling. The method names shown at each level of the call stack give you an idea of what the code is supposed to be doing. You will need to use your intuition to decide what to optimize. Only complex computations like data visualizations and machine learning algorithms have the right to be slow, and even then, only on some machines. Mundane tasks like rendering JSON and validating form data are solved problems at this point, and should not be slow. Also, if the code looks like it's doing a task that should only occur in development mode, like reloading Ruby code, it's probably not as crucial to optimize.
Finally, remember that third-party Ruby code will be mixed in with application code, so you can use this method both to profile your own code and to find the performance impact of a gem.
Misconceptions about flame graphs
Even though flame graphs are wonderful tools, it's still pretty easy to arrive at misguided conclusions when reading them.
First of all, flame graphs are pretty bad at representing multi-threaded code. Separate threads are placed side-by-side, which breaks the chronological flow of the graph, and when a thread is waiting on another thread, it appears as a plateau, giving the false impression of a bottleneck. So if you are running a web server that supports concurrent requests, which is probably all of them at this point, take note that the extra threads that appear as long plateaus off to the side are safe to ignore.
People tend to get confused about which method in a call stack to target for performance optimizations. For example, take this plateau:
It's easy to look at this and say, "Oh, this is plateauing on the PG gem, so clearly we need to try a different database adapter." Wrong.
RubyProf is very powerful, but it can't profile non-Ruby code. So this plateau represents both the time spent running #exec_prepared
and the time spent waiting for the database call to come back. So in cases like this where the code is blocked on another process, you may only shorten the plateau by a tiny fraction by switching out a gem. You may even make it longer!
When looking at a section like the one shown above, it's easy to pick one row out of all the rows and say, "Look at all this time spent rendering JSON. If only we used some other format, like XML!" This is a mistake.
If you think of your flame graph as a staircase, the time attributable to a method is like the length of a single step -- not the cumulative length of the entire staircase.
Conclusion
If you read this tutorial carefully, you'll have all you need to profile any Rails or Sinatra API. Thanks for reading and have a great day!
Fantastic article - clear, succinct and well written, thank you very much.
I work at a start-up that, thankfully, is beginning to encounter some seriously heavy scale. Performance is becoming a weekly issue for developers across the application.
Do you have any strategies that make performance more visible for developers as they’re working? I like the approach rack-mini-profiler takes with a small bubble injected into the html during development, but by-and-large we have migrated to SPAs and JSON focused APIs. We don’t render much HTML on the server these days.
I guess I would like to get ahead of some of these bottlenecks before the client even sees them, and make them a bit easier to address.
Thanks, and again great article!