Here are some quick raw notes I would like to flesh out to a blog post, eventually.
Been noticing performance on the front page of Discourse is rather bad, after digging in with MiniProfiler I notices lots of GCs are hit for a single page view. I ran some tests in production with a few settings and wanted to share them with you.
What is going on?
A strong indicator you are seeing too many GCs are huge gaps in MiniProfiler
An even stronger one is running MiniProfiler with ?pp=profile-gc-time
(4 stop the world GCs for a single request)
###Installs tested
rvm install 1.9.3-p385
rvm install 1.9.3-p385 --patch https://github.com/funny-falcon/ruby/compare/p385...p385_falcon.diff
rvm install 2.0.0
note this is an in-progress post, feel free to comment, but it will grow quite a bit as I flesh it out.
###Environments testing
- default
- RUBY_GC_MALLOC_LIMIT=90000000 (GC)
- LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.0.1.0 (GC+TC)
RAILS_ENV=production bundle exec thin start -p 3111 RAILS_ENV=production RUBY_GC_MALLOC_LIMIT=90000000 bundle exec thin start -p 3111 RAILS_ENV=production RUBY_GC_MALLOC_LIMIT=90000000 LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.0.1.0 bundle exec thin start -p 3111
(tried a bunch of other “recommended” optimisations, most of them made little to no diff)
###Test rig
Production: Intel(R) Xeon(R) CPU E3-1280 V2 @ 3.60GHz
###Test performed
Results for: ab -n 100 -c 1 http://localhost:3111/
first run discarded
###rvm use 1.9.3-p385
RAILS_ENV=production bundle exec thin start -p 3111
min mean[+/-sd] median max
Default: 209 260 15.3 263 297
GC : 84 135 32.1 139 205
GC+TC : 76 124 27.1 131 178
###rvm install 1.9.3-p385 –patch falcon
min mean[+/-sd] median max
Default: 72 131 49.1 124 540
GC : 70 120 18.9 126 158
GC+TC: 67 121 40.4 116 395
###rvm use 2.0.0
min mean[+/-sd] median max
Default: 119 158 25.3 154 211
GC : 65 120 36.1 118 221
GC+TC : 67 113 20.0 114 175
###What is the deal with RUBY_GC_MALLOC_LIMIT
This is one of the most poorly documented bits of optimisation out there:
https://github.com/ruby/ruby/blob/trunk/gc.c#L68-L70
https://github.com/ruby/ruby/blob/trunk/gc.c#L3479-L3483
Out-of-the box Ruby forces a GC every time 8 megs of memory is allocated this is insanely low, our front page caused about 32megs of allocation thus causing 4 GCs. bumping this up to allowing 90megs of allocation before forcing a GC means there is much less pressure of the GC. We are able to handle 3 or so requests without a GC.
As a rule we want the number large, but not too large, if its too big we will bleed way too much memory and GC will take forever. If its too small, like it is out-of-the-box we will have multiple GCs per page which is horrible on so many levels.
###What is the impact of raising RUBY_GC_MALLOC_LIMIT to 90 million ?
On our test rig, resident memory raised from 116Megs to 193Megs for the GC setting alone and up to 211Megs when combined with tcmalloc
###What do you recommend?
I have 2 recommendations depending on your stance:
-
Conservative: All conservative installs should use RUBY_GC_MALLOC_LIMIT=90000000 , its a no-brainer and the increased memory usage is well worth the extra perf, you get a 48% decrease in median request time just by doing that.
-
Bleeding Edge: I would recommend using Ruby 2.0 allocating memory using tcmalloc. You get a 57% decrease in median request time using that.
I can no longer recommend falcon patches due to the fact most of the patches are in 2.0 anyway, embrace the future.
###Issues with this test
The test was performed on a CPU that has Turbo Boost enabled. This means that clock speed varies based on load. It made it more complicated to get consistent test runs, that said I repeated all the tests multiple times and it was run on our production boxes.
###Why are we eating so much memory per request?
The easiest thing to answer here is what kind of memory we are consuming: (adding ?pp=profile-gc)
230 THOUSAND strings allocated during one request
It is going to take us a while to work through this to find the actual culprits, though quite a few of the strings allocated over and over again do seem to be coming from Active Record, so focusing on cutting the amount of querying is a sure way to reduce string allocations during a request life cycle.
###What about Rails 4?
We need to re-test this under rails 4 to see if its faster or slower and determine if string allocations have gone up or down.
Source: https://meta.discourse.org/t/tuning-ruby-and-rails-for-discourse/4126