Representable 2.2 Is More Than 50% Faster Than The World!

After almost two weeks of intense hacking, I’m happy to announce Representable 2.2. This is a pure performance update. I removed some very private concepts and methods, hence the minor version bump. Anyway, with the 2.2 version you can expect a 50% and more speed-up for both rendering and parsing.

This is, for Representable itself, but also for Roar, Disposable and Reform, that all use Representable internally for data transformations.

The public API hasn’t changed at all, so you’re safe to update.

To get a quick overview about the code changes, have a look here. Right, that’s only a handful lines of code that have changed.

Profiling: How We Did It.

It all started with a benchmark my friend Max was running to render a nested object graph. Here’s the structure of the document.

class FoosRepresentation < Representable::Decorator
  include Representable::JSON
    
  collection :foos, class: Foo do
    property :value

    property :bar, class: Bar do
      property :value
    end
  end
end

As you can see, this is a collection of foos. Every foo contains a nested bar object with a value property.

So he set up a profiling test with 10,000 foos containing one bar each. You can find his profiling repository here. For Representable, that basically means “Iterate 10,000 foo objects and 10,000 bar objects and serialize them.”, making it 20,001 objects to represent in total.

Benchmarks, Slow.

With Representable 2.1.8, to serialize this tree took about 3.1 seconds on my work machine.

Total: 3.138545

 %self   total   calls    name
 30.17   0.947   20001    Module#name
  7.01   0.220   790039   Representable::Definition#[]
  5.71   0.308   460025   Representable::Binding#[]
  1.68   0.135   100009   Class#new

As you can see, Module#name is called many times, and more than 100,000 objects are instantiated.

Here’s the same benchmark for deserializing a document with 200,001 objects.

Total: 3.045645

 %self  calls    name
 31.78   20001   Module#name
  6.07  710029   Representable::Definition#[]
  5.48  480021   Representable::Binding#[]
  2.46  160008   Class#new

It’s interesting to observe that parsing a document into an object graph takes about the same time as rendering it. Anyway, many objects are created and lot of time is wasted

Benchmarks, Faster!

I applied some simple implementation fixes and a structural change, resulting in the following benchmarks. We’ll discuss how I achieved that in a second.

First, we ran the rendering benchmark.

Total: 1.141275

 %self   calls   name
  6.83   310100  Representable::Definition#[]
  2.96   50007   Uber::Options::Value#evaluate
  2.78   40001   Repr..::Deserializer::Prepare#prepare!

Wow, that’s not 3.1 but 1.1 seconds to render a deeply nested object graph. No unnecessary classes are instantiated anymore and the time-consuming Module#name call has vanished.

The exact same I could achieve for parsing time.

Total: 1.173969

 %self    calls    name
  7.37    330092   Representable::Definition#[]
  3.33    70007    Uber::Options::Value#evaluate
  3.27    100029   Representable::Binding#[]
  2.09    20001    Representable#representation_wrap

What I didn’t measure, yet, is the memory foot-print which should be dramatically (yes, dramatically) smaller as the amount of objects we need to parse or render object graphs has minimized. I bet you want to know now how this 50-75% speedup was possible, and here we go.

Don’t Ask When You’re Not Interested.

The first thing I tackled was to get rid of the Module#name call. This resulted from computing the wrap for every represented object that was being serialized or parsed. Every represented object. Even though most objects don’t need a wrap, and the default case is to not have wrapping.

I moved the name query into an optional block and things got faster.

Now, Module#name is only being called when we actually want to know the wrap.

Too Many Bindings.

However, this was just one step to increasing the performance.

Another issue clearly visible in the ruby-prof outputs was that we created a Binding for every object. Every represented object. Bindings are a concept in Representable that handle rendering and parsing for one property.

In case this was a nested property, this binding would create a representer instance, again, which in turn would create bindings again, for every represented property.

bindings-before-2.2

My beautiful diagram, which makes me really proud, illustrates that: Every Foo instance in the collection will create a representer and a binding instance, even though they are identical.

When writing Representable a few years ago I had a feeling that this might become a bottle-neck one day, but being focused on designing I “forgot” about it, and no one ever complained.

Reuse Bindings.

The solution is dead simple. I introduced Representable::Cached that will save the Bindings for later reuse.

By making the Binding stateless we won’t have any trouble with stale data. The binding used to save a lot of run-time information like the represented object, and more. This now has to be passed from the outside for every run, making it reusable.

I know, you love my diagrams. Check out how the object graph has changed.

bindings-in-2.2

Say we were representing one Foo object, its decorator will now cache the binding for the Bar property and reuse it. This results in a handful of objects needed to be created.

To give you some figures, in the aforementioned benchmark setup, instead of having to instantiate 200,000 bindings all we need to do is to create four! One for the collection of foos, on for value in every foo, one for a Bar object and the last to represent the value in a bar.

Cached: An Optional Feature!

While I fully trust my changes (would be bad if I wouldn’t) I decided to add this as an optional feature in 2.2. You need to activate it manually on the top-level representer.

class FoosRepresentation < Representable::Decorator
  include Representable::JSON
  feature Representable::Cached
    
  collection :foos, class: Foo do
    # and so on

It’s a bit late to mention that, but Cached only works with the Decorator representers. It also works with modules, but it will unnecessarily pollute your models. Please don’t do that.

Reusing Decorators Across Requests

An interesting new option is caching of representers between requests. This will boost up rendering and parsing documents in your API code many times – not to speak about the reduced memory footprint.

Once a representer’s done its job, it can be reused using the #update! method.

decorator = SongRepresenter.new(song)
decorator.to_json # first request.

decorator.update!(better_song) # second request.
decorator.to_json

This is enough to reuse a representer, even a deeply nested graph.

Once I find some time, I will implement this in roar-rails. Caching and reusing representers across requests will give a significantly performance boost for many Roar/Representable apps out there.

Using Ruby-Prof For Tests

One last thing I want to mention is how I use the ruby-prof output for tests.

RubyProf.start
  representer.to_hash
# ..

# a total of 4 properties in the object graph.
data.must_match "4   Representable::Binding#initialize"

What might look like a weird crazy bullshit is a fantastic way of asserting that your speed-up actually works. Since we cannot test for speed of test runs (every machine and run is different) I simply test for object creations.

In the Cached tests, I setup a simple, but complex enough object graph and render and parse it. I let ruby-prof track object creation and method invocations. Afterwards, I make sure that really only four bindings were created (and other instantiation counts, of course).

And: this works!

No. I’m not gonna use Rspec’s mocking and expectations for that. First of all, I’ve managed not to use mocking in any of my gems’ test suites and want to keep it that way. Second, the more test code I add, the more I will miss and will go wrong.

Letting a super low-level tool like ruby-prof track method calls is a bullet-proof way to test your speed-up. I love it and was surprised by its accuracy.

There Was More.

A few more improvements went into 2.2, and a lot of lookups can still be improved. I feel like I’ve talked enough for today.

Instead of throwing more words at you, I’d like to thank Max Güntner for setting up most of the benchmark code and encouraging me to work on performance. And, special thanks go out to Charlie Somerville who turned out to be a Ruby-internal-monster and was a great help in explaining the depths of the Ruby VM to me and why attr_reader is faster than your own method, and so on.

Incredibly what you still can learn after having hacked Ruby for more than 10 years.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s