For quite a bit I’ve been playing with the idea of “schemaless” models, intrigued by the idea of dumping entire hashes into database fields instead of having to add every little setting as a real database column.
Where the latter involves the tiresome process of migrations, hash fields in databases can simply be extended ad-hoc. Being fan of a mix of both relational tables and associations, but storing infrequently accessed data in hash fields, I leave the debate up to others.
What Is A Hash Field?
As the name implies, you can save entire data structures in one database column. ActiveRecord makes this really simple as I will demonstrate it here, even though in my applications I use the excellent Sequel gem.
class User < ActiveRecord::Base serialize :content end
The schema of this table could look as follows.
create_table "users" do |t| t.string "email" t.text "content" end
Note that the content
column is of type TEXT
, nothing special. The ActiveRecord serialize
method will handle transforming the datastructure to and from content
. Check this example.
user = User.find(1) user.content #=> { "slack" => { activated: true, username: "apotonick" } }
That’s the reading part. You can also write hashes (or parts of it).
user = User.new(content: { slack: { activated: false } }) # or User.find(1).content["slack"]["activated"] = true
In other words, ActiveRecord allows you to assign a hash structure to the content
field and will serialize it when saving the record, and deserialize it back to a real hash when accessing it.
That is pretty awesome, given that you would’ve had to create and migrate two additional columns on the user table without this hash field. And that’s exactly why we call it a hash field!
No Hashes! I Want Objects!
While the illustrated hash field might be already very helpful to you, it can be extremely frustrating to fumble with nested hashes in your business logic. Adding features like coercion of specific fragments will end up in repeating code and bugs will come.
A better way is to use a twin decorator from the Disposable gem and its brand-new hash fields feature.
gem "disposable", ">= 0.3.2"
Despite the header of this post, a twin always defines a schema. This happens in a twin class.
require "disposable/twin" require "disposable/twin/property/hash" class User::Twin < Disposable::Twin include Property::Hash property :content, field: :hash do property :slack do property :activated property :username end end end
The twin allows you define any number of nested hash structures. The hash may also contain collections, check the twin docs for that.
In order to use that “twin”, you simply decorate the original model.
user = User.find(1) twin = User::Twin.new(user)
You can now work with that twin’s hash field as if it was a nested Ruby object.
twin.content.slack.activated = true twin.content.slack.username = "solnic"
To push the modified hash back to the model, you have to call sync
.
twin.sync user.content #=> { "slack" => { "activated" => true, "username" => "solnic" } }
When calling user.save
now, your changes are persisted.
Object-Oriented Schemas
This is already very useful, since there’s no way you can screw up that hash anymore. The twin’s schema gives you a rock-solid API.
“Well, that’s nice, but I can use Hashie for that!” might be what you’re thinking right now. Yes, but can you add decorator methods to nested fragments, too?
class User::Twin < Disposable::Twin include Property::Hash property :content, field: :hash do property :slack do property :activated property :username def activate! self.activated = true end end end end
Setting the activation status of the user’s slack channel is now a method call, no assignment anymore.
twin.content.slack.activate!
That’s not only convenient, that’s safe.
Coercion
Speaking of safety: Why not use coercion to make sure the assigned value is always a boolean?
require "disposable/twin/coercion" class User::Twin < Disposable::Twin include Property::Hash feature Coercion property :content, field: :hash do property :slack do property :activated, type: Types::Bool
The :type
option allows to specify the coercion target. The twin’s setter will now automatically convert.
twin.content.slack.activated = 1 twin.content.slack.activated #=> true
When it comes to coercion, you have the entire API of the dry-types gem right at your fingertips, because Disposable uses it under the hood.
Remapping Properties
It’s often a good idea to hide the internal data structure to the user in the bad and evil outer world. For example, the content
field doesn’t have to be visible. We can remap the slack
field directly to the top-level.
class User::Twin < Disposable::Twin include Property::Hash property :content, field: :hash do property :slack do # .. end unnest :slack, from: :content end
That’s exactly what unnest
does.
twin.slack.activated #=> false
You now hide implementation details from the environment, and simplified your object’s API – a very smart thing to do.
Validations
Disposable’s twins are data mapper objects. Their job is, I say it again, mapping data.
What if you wanted to hook a form on top of the object we just created, with nested fields and validations? You can use the popular Reform gem. Reform internally uses twins to compose forms, the API is identical.
class User::Form < Reform::Form include Disposable::Twin::Property::Hash feature Coercion property :content, field: :hash do property :slack do property :activated, type: Types::Bool property :username validates :username, length: 3..30 validates :activated, presence: true end end unnest :slack, from: :content end
This looks almost identical to the twin, except for added validations.
You can now render that form by yourself or with a form builder.
simple_form_for User::Form.new(@user) do |f| f.fields_for :slack do |s| s.checkbox :activated s.username :username end
How easy is that? The form renderer doesn’t even know it’s dealing with a form backed by a twin, it thinks it’s an ActiveModel-compliant model instance.
You can then validate and save the form.
form = User::Form.new(@user) form.validate(slack: { username: "solnic" }) #=> true form.sync @user.content #=> { "slack" => { "username" => "solnic" } }
These are the very same mechanics that we already discussed. Again: a form is a twin. It makes it very convenient to work on deeply nested hash structures, without any chance of messing up structure, formatting or types.
Hash Fields Everywhere!
We use hash fields in literally all projects now. Instead of having to think about how to structure and name columns, and which table to use, we have very simple tables that have hash fields. That allows us to still use relational features like associations.
However, the new JSONB column type of Postgres really blew my mind. This column allows nested hashes just as we did it in this article, plus you can query and index over fragments in JSONB columns.
To me, this is the best of both worlds combined: Solid relational tables with dynamic and evolvable fields to maintain my ever-changing application data. If you want to learn more about this, my new book will go into the details of this development approach using the Trailblazer architecture.
Using JSONB columns with twins and Reform forms has helped us immensely both with development speed and stability, as the twin constraints your access to the data. You should give it a try – I’m pretty sure you won’t regret it.
What about scopes? Wouldn’t they be slow?
LikeLike
Uhm, how are scopes related to this? Scopes are to find data, twins are to map data. Can you elaborate?
LikeLiked by 1 person
Uber has a really nice blog article about how they went “schemaless.” https://eng.uber.com/schemaless-part-one/
LikeLike
Wow, thanks Randy, that is a great series of posts, I’m devouring them now!
LikeLike
Following your example above using Reform, I get `uninitialized constant Disposable::Twin::Property::Hash`. Adding `require ‘disposable/twin/coercion’` then leads me to `cannot load such file — dry-types`.
Using reform 2.2.1 and reform-rails 0.1.4. Thanks
LikeLike
Forgot to mention, using it with Rails 5.
LikeLike
Hi,
first of all thx for great article.
One thing bothers me – if you store hashes in DB how do you actually perform searches over stored data (for example we store user as a hash in db and we would like to get all users with specific name)? I do not think you get all hashes from DB and iterates over them all, right ?
LikeLike
Hi Robert, thanks! The search is implemented by Postgres’ JSONB datatype. It allows nested hash searches in a high-perf environment.
LikeLike