Ruby & Rails Speedshop Workshop

Solving real world problems with high throughput applications. Really solving the current issues you see in production.

“the problem can span the silicon or the pixels on the screen. this is a very broad problem space. this requires using a wide variety of tools and ways of thinking. focusing on measuring how an app runs is a great place to start. learning the tools to help you find the problem.”

Quotes

“Premature optimization is bad. Why? Stop thinking about performance as your write code. Start thinking about it top down after you see something going wrong. Make a change. Test your improvement hypothesis.”

“Active Record. Active Record. SQL Optimization. Active Record. 80% of the time it is in the database.”

“GIL. The runtime. A virtual machine. The Ruby VM runs the instructions. VM Lock. The number of threads that have access to the VM. 1 thread. Only 1 thread can access the VM at any given moment.”

“Most rails app only need 3-5 puma threads. Usually there isn’t enough IO to need more. Adding too many threads comes at a high cost. For sidekiq 10 threads is maybe better given works tend to make more io operations.”

“Ruby was written by 3 japanese guys 20 years ago to make an easier way to do shell scripting. A fun nice vm language for simple tasks.”

“How many servers do you need for a given load? 80% of the apps I work with are over scaled. Little’s Law. Capacity is a common problem in different fields. The relationship between latency and throughput. WorkInProgress = Latency * Throughput. Based on the economics of the long run… Twitter. Run it at 10% - 25% utilization.”

“When we scale up a dyno all we are doing is reducing the chance that a request is going to be queued before it is processed.”

“An application has predictable throughput is far more scalable than one that isn’t.”

“You gotta figure out how to scale based on queue depth.”

“Get the same amount of sample data locally as you have in production.”

“We don’t need to micro benchmarking really narrow things. Write a benchmark that is based on a very slow performance that you see in production.”

“The first time a line of code runs is not the same as the second time it runs. Configure your ‘warm ups’ for benchmark tests.”

“Profiles often lie. New Relic often will claim 70% of time is spent in Ruby. This is often misleading. These additional tools will help us isolate where the problem is actually happening.”

“If your goal is to improve customer experience and make a fast feeling app. You have to factor into the front end. Given we work on machines that are a lot faster than 90% of our users. We need to think about network latency. Older smart phones also have some processing limitations.”

Tools

WRK. A tool for hitting an endpoint and benchmarking it. Latency. Response time. (apache bench, seige, …).
PUMA allow you to have multiple threads. Only one thread can occupy the ruby vm. but others can operate within PUMA to make IO/DB operations (anything that isn’t ruby execution).
New Relic. Skylight.
Benchmark module. Ruby standard library. A nice way to test an algorithm, hot paths, allocations, and background jobs.
Tracepoint api. Useful for debugging “this” whenever “that” is run.
Kalibera. More indepth stats.
Statistical Profilers. Stackprof, rbspy. Low overhead. Low precision. Tracing Profilers. Ruby_Prof. High overhead. High Precision. Statistical Profilers scale and work better for larger applications. New Relic uses this approach.
Rack mini prof.
gc_tracer, derailed_benchmarks, and memory_profiler are tools to help us isolate issues with memory management.
data dog. watch the timing on a slow request to see changes and how they affect performance.

References

The Complete Guide To Rails Performance.
Ruby conference in Japan it worth taking a look at.

Questions

I would be curious what’s the best way to do event sourcing in rails?
what’s the best way to do event sourcing in rails? maybe there’s a gem for it and/or best practices?
You mentioned JRuby and CRuby. What’s your thoughts on adopting these?

Why not just add more threads? How many is enough?

“More threads adds more potential for badly managed memory management. Puma. 5 threads. And then scale dynos in line with cpu usage.”

Org structure and company culture. What can we do as a group of people?

“Taylorism. What gets measured gets improved. A company that prioritizes features over quality, it will struggle long term. Fixing bugs is straight foward. I think cultures need to start quantifying debt and sla metrics and getting them infront of the product / management strategy. Response times. Throughput. Errors. Infrastructure metrics. On a week to week basis. Make tech facing metrics more transparent.”

What are your thoughts on PGBouncer?

“People reach to it too early. Audit your db connection usage. Decrease sidekiq to 10 db connections. Set the dynos connection correctly for the right number of serers. And then once you do that duti and you hit 500 then it’s worth using PG Bouncer. At the end of the day it’s a temporary solution. It’s a throttle. It’s misleading. You need a bigger database provider.”

What can we do about this bad memory allocation issue?

“JEMALLOC. The memory allocator developed by facebook. Also runs github. It employs a better set of strategies for multi threaded applications. It’s a better alternative to MALLOC. It also resolves many memory allocation issues. Everyone should be using this! No down sides. This is a new strategy that may require extensive testing.”

Rack timeout thoughts?

“Impossible to avoid. But I don’t like it. If rack time out fires due to a slow request, whilst a db connection is open, the connection is lost and is not returned to the pool. Increasing the pool is one temporary solution but doesn’t scale. Eventually you need to stop rack time out from throwing.”

What’s your take on callbacks? Especially `after_create` race condition?

“I’m ok with them. It’s hard to debug. From a performance perspective its’ fine.”

After extensive sql and active record optimization… what is the typical next step when your app starts needing more than a single PG instance? sharding? read replicas? common gems and/or pattern recommendations?

“Most people reach for sharding and replication too soon. A lot of the gems in this space are not maintained. managing db connections across threads should not be taken lightly. Migrate off Heroku Postgres to RDS Aurora. Vertically scale your way out of this problem. This is the first step. RDS Postgres plans are way bigger. RDS seems to have a lot more potential. Rails 6 is going to deliver an official multi database solution. This will come in 2 years.”

Why is my app running out of memory?

“The 3 classes of memory problems with memory allocation and management. Bloat, Leaks & Time spent allocating. The Ruby Object Space is made up of a list of meta data items pointing to where the object is stored. Organized into pages.”

When is garbage collection done?

“When memory usage (r values) pass a threshold and more space is allocated. It’s based on usage. Ruby is a dynamic language making it quote a challenge for garbage collection to encourage the memory usage to return back down to a lower level. Re-allocation is an important aspect of memory allocation with ruby. The memory operator will tend to hold onto the memory unless it is sure you won’t be using it again. C Extensions make it very difficult to handle efficient garbage collection and memory fragmentation. This makes it appear as though we have a leak.”

What are some things we can do to force a memory clean up?

“Limit the maximum amount of memory you have access to.”

You mentioned. Twitter. Ruby. Scala. Adding more servers or switching to Scala. On balance do you feel switching to the JVM does enable you to operate on less infrastructure?

“A lot has changed around Rails, Ruby and multi threading since Twitter’s decision to move towards Scala. Today I think they would not have made the shift, just like other major players such as GitHub.”

Follow up actions

add a WRK metrics to the build pipeline and chart it so we can see averages over time.
share learnings. specifically around the math on the performance challenges we are having. throughput. latency. dynos. utilization.
setup seeding data that is more similar to production.
adding 10s cache header on our js sounds like it would save us 1s on every web app request.

Tasks

Request Benchmarking.

brew install wrk

wrk -H Authorization:xxx http://localhost:3000/api/v1/feed -c 100 -t 100 -d 10

results

Running 10s test
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   306.52ms  132.89ms   1.25s    93.79%
    Req/Sec    15.86      8.17    40.00     77.64%
  298 requests in 10.01s, 582.96KB read
Requests/sec:     29.77
Transfer/sec:     58.24KB

Algorithm Benchmarking

require_relative 'config/application'

Rails.application.initialize!

require 'benchmark/ips'

user = User.find 14604

Rails.logger = Logger.new(nil)

Benchmark.ips do |x|
  x.config(:time => 5, :warmup => 2)
  x.time = 100
  x.warmup = 2

  x.report("homefeed") {
    ABCService.run(account: user.main_account)
  }

  x.compare!
end

Results

Warming up --------------------------------------
            homefeed     1.000  i/100ms
Calculating -------------------------------------
            homefeed      0.233  (± 0.0%) i/s -      2.000  in   8.577570s

Spy on a running process

ssh$> sudo rbspy 1182

Ruby Prof

gem 'ruby-prof'
...
use Rack::RubyProf, :path => './tmp/profile'
run Rails.application

Ruby & Rails Speedshop Workshop

Quotes

Tools

References

Questions

Why not just add more threads? How many is enough?

Org structure and company culture. What can we do as a group of people?

What are your thoughts on PGBouncer?

What can we do about this bad memory allocation issue?

Rack timeout thoughts?

What’s your take on callbacks? Especially after_create race condition?

After extensive sql and active record optimization… what is the typical next step when your app starts needing more than a single PG instance? sharding? read replicas? common gems and/or pattern recommendations?

Why is my app running out of memory?

When is garbage collection done?

What are some things we can do to force a memory clean up?

You mentioned. Twitter. Ruby. Scala. Adding more servers or switching to Scala. On balance do you feel switching to the JVM does enable you to operate on less infrastructure?

Follow up actions

Tasks

Request Benchmarking.

Algorithm Benchmarking

Spy on a running process

Ruby Prof

What’s your take on callbacks? Especially `after_create` race condition?