Schemaless Models With Hash Fields or JSONB

For quite a bit I’ve been playing with the idea of “schemaless” models, intrigued by the idea of dumping entire hashes into database fields instead of having to add every little setting as a real database column.

Where the latter involves the tiresome process of migrations, hash fields in databases can simply be extended ad-hoc. Being fan of a mix of both relational tables and associations, but storing infrequently accessed data in hash fields, I leave the debate up to others.

What Is A Hash Field?

As the name implies, you can save entire data structures in one database column. ActiveRecord makes this really simple as I will demonstrate it here, even though in my applications I use the excellent Sequel gem.

class User < ActiveRecord::Base
  serialize :content
end

The schema of this table could look as follows.

create_table "users" do |t|
 t.string "email"
 t.text "content"
 end

Note that the content column is of type TEXT, nothing special. The ActiveRecord serialize method will handle transforming the datastructure to and from content. Check this example.

user = User.find(1)
user.content #=> 
{
  "slack" => {
               activated: true, 
               username:  "apotonick"
  }
}

That’s the reading part. You can also write hashes (or parts of it).

user = User.new(content: { slack: { activated: false } })
# or
User.find(1).content["slack"]["activated"] = true

In other words, ActiveRecord allows you to assign a hash structure to the content field and will serialize it when saving the record, and deserialize it back to a real hash when accessing it.

That is pretty awesome, given that you would’ve had to create and migrate two additional columns on the user table without this hash field. And that’s exactly why we call it a hash field!

No Hashes! I Want Objects!

While the illustrated hash field might be already very helpful to you, it can be extremely frustrating to fumble with nested hashes in your business logic. Adding features like coercion of specific fragments will end up in repeating code and bugs will come.

A better way is to use a twin decorator from the Disposable gem and its brand-new hash fields feature.

gem "disposable", ">= 0.3.2"

Despite the header of this post, a twin always defines a schema. This happens in a twin class.

require "disposable/twin"
require "disposable/twin/property/hash"

class User::Twin < Disposable::Twin
  include Property::Hash

  property :content, field: :hash do
    property :slack do
      property :activated
      property :username
    end
  end
end

The twin allows you define any number of nested hash structures. The hash may also contain collections, check the twin docs for that.

In order to use that “twin”, you simply decorate the original model.

user = User.find(1)
twin = User::Twin.new(user)

You can now work with that twin’s hash field as if it was a nested Ruby object.

twin.content.slack.activated = true
twin.content.slack.username = "solnic"

To push the modified hash back to the model, you have to call sync.

twin.sync

user.content #=> 
  { "slack" => { "activated" => true, "username" => "solnic" } }

When calling user.save now, your changes are persisted.

Object-Oriented Schemas

This is already very useful, since there’s no way you can screw up that hash anymore. The twin’s schema gives you a rock-solid API.

“Well, that’s nice, but I can use Hashie for that!” might be what you’re thinking right now. Yes, but can you add decorator methods to nested fragments, too?

class User::Twin < Disposable::Twin
  include Property::Hash

  property :content, field: :hash do
    property :slack do
      property :activated
      property :username

      def activate!
        self.activated = true
      end
    end
  end
end

Setting the activation status of the user’s slack channel is now a method call, no assignment anymore.

twin.content.slack.activate!

That’s not only convenient, that’s safe.

Coercion

Speaking of safety: Why not use coercion to make sure the assigned value is always a boolean?

require "disposable/twin/coercion"

class User::Twin < Disposable::Twin
  include Property::Hash
  feature Coercion

  property :content, field: :hash do
   property :slack do
     property :activated, type: Types::Bool

The :type option allows to specify the coercion target. The twin’s setter will now automatically convert.

twin.content.slack.activated = 1 
twin.content.slack.activated #=> true

When it comes to coercion, you have the entire API of the dry-types gem right at your fingertips, because Disposable uses it under the hood.

Remapping Properties

It’s often a good idea to hide the internal data structure to the user in the bad and evil outer world. For example, the content field doesn’t have to be visible. We can remap the slack field directly to the top-level.

class User::Twin < Disposable::Twin
  include Property::Hash

  property :content, field: :hash do
    property :slack do
    # ..
  end

  unnest :slack, from: :content
end

That’s exactly what unnest does.

twin.slack.activated #=> false

You now hide implementation details from the environment, and simplified your object’s API – a very smart thing to do.

Validations

Disposable’s twins are data mapper objects. Their job is, I say it again, mapping data.

What if you wanted to hook a form on top of the object we just created, with nested fields and validations? You can use the popular Reform gem. Reform internally uses twins to compose forms, the API is identical.

class User::Form < Reform::Form
  include Disposable::Twin::Property::Hash
  feature Coercion

  property :content, field: :hash do
    property :slack do
      property :activated, type: Types::Bool
      property :username

      validates :username, length: 3..30
      validates :activated, presence: true
    end
  end

  unnest :slack, from: :content
end

This looks almost identical to the twin, except for added validations.

You can now render that form by yourself or with a form builder.

simple_form_for User::Form.new(@user) do |f|
  f.fields_for :slack do |s|
    s.checkbox :activated
    s.username :username
  end

How easy is that? The form renderer doesn’t even know it’s dealing with a form backed by a twin, it thinks it’s an ActiveModel-compliant model instance.

You can then validate and save the form.

form = User::Form.new(@user)

form.validate(slack: { username: "solnic" }) #=> true
form.sync

@user.content #=> { "slack" => { "username" => "solnic" } }

These are the very same mechanics that we already discussed. Again: a form is a twin. It makes it very convenient to work on deeply nested hash structures, without any chance of messing up structure, formatting or types.

Hash Fields Everywhere!

We use hash fields in literally all projects now. Instead of having to think about how to structure and name columns, and which table to use, we have very simple tables that have hash fields. That allows us to still use relational features like associations.

However, the new JSONB column type of Postgres really blew my mind. This column allows nested hashes just as we did it in this article, plus you can query and index over fragments in JSONB columns.

To me, this is the best of both worlds combined: Solid relational tables with dynamic and evolvable fields to maintain my ever-changing application data. If you want to learn more about this, my new book will go into the details of this development approach using the Trailblazer architecture.

Using JSONB columns with twins and Reform forms has helped us immensely both with development speed and stability, as the twin constraints your access to the data. You should give it a try – I’m pretty sure you won’t regret it.

8 thoughts on “Schemaless Models With Hash Fields or JSONB”

Mikita says:

July 6, 2016 at 2:19 pm

What about scopes? Wouldn’t they be slow?

LikeLike

1. apotonick says:
  
  July 6, 2016 at 10:06 pm
  
  Uhm, how are scopes related to this? Scopes are to find data, twins are to map data. Can you elaborate?
  
  LikeLiked by 1 person
  
Randy P says:

July 6, 2016 at 5:37 pm

Uber has a really nice blog article about how they went “schemaless.” https://eng.uber.com/schemaless-part-one/

LikeLike

1. apotonick says:
  
  July 6, 2016 at 10:08 pm
  
  Wow, thanks Randy, that is a great series of posts, I’m devouring them now!
  
  LikeLike
  
codevader says:

July 23, 2016 at 7:13 pm

Following your example above using Reform, I get `uninitialized constant Disposable::Twin::Property::Hash`. Adding `require ‘disposable/twin/coercion’` then leads me to `cannot load such file — dry-types`.

Using reform 2.2.1 and reform-rails 0.1.4. Thanks

LikeLike

1. codevader says:
  
  July 23, 2016 at 7:14 pm
  
  Forgot to mention, using it with Rails 5.
  
  LikeLike
  
Robert says:

October 20, 2016 at 2:47 pm

Hi,
first of all thx for great article.
One thing bothers me – if you store hashes in DB how do you actually perform searches over stored data (for example we store user as a hash in db and we would like to get all users with specific name)? I do not think you get all hashes from DB and iterates over them all, right ?

LikeLike

1. apotonick says:
  
  October 21, 2016 at 12:29 am
  
  Hi Robert, thanks! The search is implemented by Postgres’ JSONB datatype. It allows nested hash searches in a high-perf environment.
  
  LikeLike

Nick Sutterer

Respecting local traditions since 1981.