After almost two weeks of intense hacking, I’m happy to announce Representable 2.2. This is a pure performance update. I removed some very private concepts and methods, hence the minor version bump. Anyway, with the 2.2 version you can expect a 50% and more speed-up for both rendering and parsing.
This is, for Representable itself, but also for Roar, Disposable and Reform, that all use Representable internally for data transformations.
The public API hasn’t changed at all, so you’re safe to update.
To get a quick overview about the code changes, have a look here. Right, that’s only a handful lines of code that have changed.
Profiling: How We Did It.
It all started with a benchmark my friend Max was running to render a nested object graph. Here’s the structure of the document.
class FoosRepresentation < Representable::Decorator include Representable::JSON collection :foos, class: Foo do property :value property :bar, class: Bar do property :value end end end
As you can see, this is a collection of
foo contains a nested
bar object with a
So he set up a profiling test with 10,000
foos containing one
bar each. You can find his profiling repository here. For Representable, that basically means “Iterate 10,000 foo objects and 10,000 bar objects and serialize them.”, making it 20,001 objects to represent in total.
With Representable 2.1.8, to serialize this tree took about 3.1 seconds on my work machine.
Total: 3.138545 %self total calls name 30.17 0.947 20001 Module#name 7.01 0.220 790039 Representable::Definition# 5.71 0.308 460025 Representable::Binding# 1.68 0.135 100009 Class#new
As you can see,
Module#name is called many times, and more than 100,000 objects are instantiated.
Here’s the same benchmark for deserializing a document with 200,001 objects.
Total: 3.045645 %self calls name 31.78 20001 Module#name 6.07 710029 Representable::Definition# 5.48 480021 Representable::Binding# 2.46 160008 Class#new
It’s interesting to observe that parsing a document into an object graph takes about the same time as rendering it. Anyway, many objects are created and lot of time is wasted
I applied some simple implementation fixes and a structural change, resulting in the following benchmarks. We’ll discuss how I achieved that in a second.
First, we ran the rendering benchmark.
Total: 1.141275 %self calls name 6.83 310100 Representable::Definition# 2.96 50007 Uber::Options::Value#evaluate 2.78 40001 Repr..::Deserializer::Prepare#prepare!
Wow, that’s not 3.1 but 1.1 seconds to render a deeply nested object graph. No unnecessary classes are instantiated anymore and the time-consuming
Module#name call has vanished.
The exact same I could achieve for parsing time.
Total: 1.173969 %self calls name 7.37 330092 Representable::Definition# 3.33 70007 Uber::Options::Value#evaluate 3.27 100029 Representable::Binding# 2.09 20001 Representable#representation_wrap
What I didn’t measure, yet, is the memory foot-print which should be dramatically (yes, dramatically) smaller as the amount of objects we need to parse or render object graphs has minimized. I bet you want to know now how this 50-75% speedup was possible, and here we go.
Don’t Ask When You’re Not Interested.
The first thing I tackled was to get rid of the
Module#name call. This resulted from computing the wrap for every represented object that was being serialized or parsed. Every represented object. Even though most objects don’t need a wrap, and the default case is to not have wrapping.
I moved the name query into an optional block and things got faster.
Module#name is only being called when we actually want to know the wrap.
Too Many Bindings.
However, this was just one step to increasing the performance.
Another issue clearly visible in the ruby-prof outputs was that we created a
Binding for every object. Every represented object. Bindings are a concept in Representable that handle rendering and parsing for one property.
In case this was a nested property, this binding would create a representer instance, again, which in turn would create bindings again, for every represented property.
My beautiful diagram, which makes me really proud, illustrates that: Every
Foo instance in the collection will create a representer and a binding instance, even though they are identical.
When writing Representable a few years ago I had a feeling that this might become a bottle-neck one day, but being focused on designing I “forgot” about it, and no one ever complained.
The solution is dead simple. I introduced
Representable::Cached that will save the
Bindings for later reuse.
By making the
Binding stateless we won’t have any trouble with stale data. The binding used to save a lot of run-time information like the represented object, and more. This now has to be passed from the outside for every run, making it reusable.
I know, you love my diagrams. Check out how the object graph has changed.
Say we were representing one
Foo object, its decorator will now cache the binding for the
Bar property and reuse it. This results in a handful of objects needed to be created.
To give you some figures, in the aforementioned benchmark setup, instead of having to instantiate 200,000 bindings all we need to do is to create four! One for the collection of
foos, on for
value in every foo, one for a
Bar object and the last to represent the
value in a bar.
Cached: An Optional Feature!
While I fully trust my changes (would be bad if I wouldn’t) I decided to add this as an optional feature in 2.2. You need to activate it manually on the top-level representer.
class FoosRepresentation < Representable::Decorator include Representable::JSON feature Representable::Cached collection :foos, class: Foo do # and so on
It’s a bit late to mention that, but
Cached only works with the
Decorator representers. It also works with modules, but it will unnecessarily pollute your models. Please don’t do that.
Reusing Decorators Across Requests
An interesting new option is caching of representers between requests. This will boost up rendering and parsing documents in your API code many times – not to speak about the reduced memory footprint.
Once a representer’s done its job, it can be reused using the
decorator = SongRepresenter.new(song) decorator.to_json # first request. decorator.update!(better_song) # second request. decorator.to_json
This is enough to reuse a representer, even a deeply nested graph.
Once I find some time, I will implement this in roar-rails. Caching and reusing representers across requests will give a significantly performance boost for many Roar/Representable apps out there.
Using Ruby-Prof For Tests
One last thing I want to mention is how I use the ruby-prof output for tests.
RubyProf.start representer.to_hash # .. # a total of 4 properties in the object graph. data.must_match "4 Representable::Binding#initialize"
What might look like a weird crazy bullshit is a fantastic way of asserting that your speed-up actually works. Since we cannot test for speed of test runs (every machine and run is different) I simply test for object creations.
Cached tests, I setup a simple, but complex enough object graph and render and parse it. I let ruby-prof track object creation and method invocations. Afterwards, I make sure that really only four bindings were created (and other instantiation counts, of course).
And: this works!
No. I’m not gonna use Rspec’s mocking and expectations for that. First of all, I’ve managed not to use mocking in any of my gems’ test suites and want to keep it that way. Second, the more test code I add, the more I will miss and will go wrong.
Letting a super low-level tool like ruby-prof track method calls is a bullet-proof way to test your speed-up. I love it and was surprised by its accuracy.
There Was More.
A few more improvements went into 2.2, and a lot of lookups can still be improved. I feel like I’ve talked enough for today.
Instead of throwing more words at you, I’d like to thank Max Güntner for setting up most of the benchmark code and encouraging me to work on performance. And, special thanks go out to Charlie Somerville who turned out to be a Ruby-internal-monster and was a great help in explaining the depths of the Ruby VM to me and why
attr_reader is faster than your own method, and so on.
Incredibly what you still can learn after having hacked Ruby for more than 10 years.