Ler blog em Português

Using PostgreSQL and jsonb with Ruby on Rails

Read in 8 minutes

PostgreSQL 9.4 introduced jsonb, a new column type for storing documents in your relational database. jsonb and json columns look exactly the same on a higher level, but differs on the storage implementation.

The advantage of using jsonb is that you can easily integrate relational and non-relation data, with performance that can be better than most non-relational databases like MongoDB.

Understanding the differences between json and jsonb

So, what are the differences between both column types? When you compare writing speed, jsonb will be slightly slower, due to the way data is stored.

Another difference is that json columns will reparse the stored value every time; that means that key ordering will be same from the input. This doesn’t happen on jsonb columns, as content is stored in binary format and no key ordering is guaranteed. So if you have software that relies on key ordering (really?) jsonb may not be the best choice for your application.

Let’s run a small benchmark. In this case, I used the following JSON structure:

{
  "twitter": "johndoe1",
  "github": "johndoe1",
  "bio": "Lorem ipsum dolor sit amet, consectetur adipisicing elit. Labore impedit aliquam sapiente dolore magni aliquid ipsa ad, enim, esse ut reprehenderit quaerat deleniti fugit eaque. Vero eligendi voluptatibus atque, asperiores.",
  "blog": "http://johndoe1.example.com",
  "interests": [
    "music",
    "movies",
    "programming"
  ],
  "age": 42,
  "newsletter": true
}

Inserting 30,000 records didn’t have any real difference, so I believe that jsonb can be slower in complex structures.

Rehearsal ------------------------------------------------
insert jsonb   2.690000   0.590000   3.280000 ( 12.572343)
insert json    2.690000   0.590000   3.280000 ( 12.766534)
--------------------------------------- total: 6.560000sec

                   user     system      total        real
insert jsonb   2.680000   0.590000   3.270000 ( 13.206602)
insert json    2.650000   0.580000   3.230000 ( 12.577138)

The real difference is when you query a json/jsonb column. First, let’s see what the table and indexes look like.

CREATE TABLE users (
  id serial not null,
  settings jsonb not null default '{}',
  preferences json not null default '{}'
);

CREATE INDEX settings_index ON users USING gin (settings);
CREATE INDEX twitter_settings_index ON users ((settings->>'github'));
CREATE INDEX preferences_index ON users ((preferences->>'github'));

Notice that we have a GIN index on the settings column, and two expression index on a given path (github). With 30,000 records, searching for the Github username john30000 (the last inserted record), will give us the following numbers:

Rehearsal -----------------------------------------------------------------
read jsonb (index column)       0.030000   0.030000   0.060000 (  3.673465)
read jsonb (expression index)   0.010000   0.010000   0.020000 (  0.087105)
read json (expression index)    0.010000   0.020000   0.030000 (  0.080121)
read json (no index)            0.060000   0.030000   0.090000 (113.206747)
-------------------------------------------------------- total: 0.200000sec

                                    user     system      total        real
read jsonb (index column)       0.010000   0.020000   0.030000 (  0.092476)
read jsonb (expression index)   0.010000   0.010000   0.020000 (  0.078916)
read json (expression index)    0.010000   0.010000   0.020000 (  0.081908)
read json (no index)            0.050000   0.040000   0.090000 (110.761944)

As you can see, expression indexes have pretty much the same performance, so there’s no real gain here. The difference comes when you try to query a path that have no index; while the jsonb can have a GIN/GiST index on the whole column, that’s not possible with json columns. That’s why it’s so much more slower.

Let’s check the query planner for reading on path without indexes.

EXPLAIN
SELECT *
FROM users
WHERE settings @> '{"twitter": "john30000"}' LIMIT 1;

--                                      QUERY PLAN
-- -------------------------------------------------------------------------------------
--  Limit  (cost=28.23..31.96 rows=1 width=468)
--    ->  Bitmap Heap Scan on users  (cost=28.23..140.07 rows=30 width=468)
--          Recheck Cond: (settings @> '{"twitter": "john30000"}'::jsonb)
--          ->  Bitmap Index Scan on settings_index  (cost=0.00..28.23 rows=30 width=0)
--                Index Cond: (settings @> '{"twitter": "john30000"}'::jsonb)

EXPLAIN
SELECT *
FROM users
WHERE preferences->>'twitter' = 'john30000' LIMIT 1;

--                                QUERY PLAN
-- -------------------------------------------------------------------------
--  Limit  (cost=0.00..25.23 rows=1 width=468)
--    ->  Seq Scan on users  (cost=0.00..3784.00 rows=150 width=468)
--          Filter: ((preferences ->> 'twitter'::text) = 'john30000'::text)

The most important thing here is that json performed a sequential scan, which means that PostgreSQL goes record by record, sequentially, until the condition is satisfied; also remember that the JSON content is going to be reparsed for each record, which can be slow for complex structures.

That doesn’t happen on a jsonb column. The search is not as optimized as it would be with an expression index on the path you’re querying, but still uses the index.

One thing that you have to keep in mind is that jsonb will always perform a sequential scan if you use the ->> operator on a path that have no expression index.

EXPLAIN
SELECT *
FROM users
WHERE settings->>'twitter' = 'johndoe30000' LIMIT 1;

--                                QUERY PLAN
-- -------------------------------------------------------------------------
--  Limit  (cost=0.00..25.23 rows=1 width=468)
--    ->  Seq Scan on users  (cost=0.00..3784.00 rows=150 width=468)
--          Filter: ((settings ->> 'twitter'::text) = 'johndoe30000'::text)
-- (3 rows)

So if you don’t know beforehand which keys you’ll query, or if you can query any path, make sure you define an GIN/GiST index and use the @> (or any other operator that benefits from that index).

Converting json into jsonb

If you’re already using json or a text column for storing JSON documents, you can convert it to jsonb, so you can rely on column indexes.

BEGIN;
ALTER TABLE users ADD COLUMN preferences_jsonb jsonb DEFAULT '{}';
UPDATE users set preferences_jsonb = preferences::jsonb;
ALTER TABLE users ALTER COLUMN preferences_jsonb SET NOT NULL;
ALTER TABLE users RENAME COLUMN preferences TO preferences_json;
ALTER TABLE users RENAME COLUMN preferences_jsonb TO preferences;

-- Don't remove the column until you're sure everything is working.
-- ALTER TABLE users DROP COLUMN preferences_json;

COMMIT;

Now that you know how json works, let’s see how you can use it with Ruby on Rails

Using jsonb with Ruby on Rails

Rails introduced jsonb support in v4.2. Is as simple as using a string or text column. You can see below how to add a jsonb column to a new table and an existing one.

# db/migrate/*_create_users.rb
class CreateUsers < ActiveRecord::Migration
  def change
    enable_extension 'citext'

    create_table :users do |t|
      t.text :name, null: false
      t.citext :username, null: false
      t.jsonb :preferences, null: false, default: '{}'
    end

    add_index  :users, :preferences, using: :gin
  end
end

# db/migrate/*_add_jsonb_column_to_users.rb
class AddJsonbColumnToUsers < ActiveRecord::Migration
  def change
    add_column :users, :preferences, :jsonb, null: false, default: '{}'
    add_index  :users, :preferences, using: :gin
  end
end

Notice that we’re also defining a GIN index; if you want to create an expression index for a given path, you’ll have to use execute. In this case, Rails won’t know how to dump your index using Ruby, so you’re better off changing the dump format to SQL.

# config/initializers/active_record.rb
Rails.application.config.active_record.schema_format = :sql

# db/migrate/*_add_index_to_preferences_path_on_users.rb
class AddIndexToPreferencesPathOnUsers < ActiveRecord::Migration
  def change
    execute <<-SQL
      CREATE INDEX user_prefs_newsletter_index ON users ((preferences->>'newsletter'))
    SQL
  end
end

Your model won’t need any configuration. Just create your records providing an object that supports JSON serialization.

user = User.create!({
  name: 'John Doe',
  username: 'johndoe',
  preferences: {
    twitter: 'johndoe',
    github: 'johndoe',
    blog: 'http://example.com'
  }
})

# Reload record from database to enforce serialization.
user.reload

# Show preferences.
user.preferences
#=> {"blog"=>"http://example.com", "github"=>"johndoe", "twitter"=>"johndoe"}

# Get blog.
user.preferences['blog']
#=> http://example.com

You can see that all keys are returned as strings. You can use a custom serializer so you can access the JSON object with symbols as well.

# app/models/user.rb
class User < ActiveRecord::Base
  serialize :preferences, HashSerializer
end

# app/serializers/hash_serializer.rb
class HashSerializer
  def self.dump(hash)
    hash.to_json
  end

  def self.load(hash)
    (hash || {}).with_indifferent_access
  end
end

Another interesting ActiveRecord feature is store_accessor. If you see yourself setting some attributes frequently, you can create accessors, so you can just assign the property, instead of passing through the JSON. This also makes easier to validate properties and create forms. So, if we want to create a form for storing a blog url, Github and Twitter accounts, you can create something like this:

class User < ActiveRecord::Base
  serialize :preferences, HashSerializer
  store_accessor :preferences, :blog, :github, :twitter
end

Now you can simply assign these properties.

user = User.new(blog: 'http://example.org', github: 'johndoe')

user.preferences
#=> {"blog"=>"http://example.org", "github"=>"johndoe"}

user.blog
#=> http://example.org

user.preferences[:github]
#=> johndoe

user.preferences['github']
#=> johndoe

With store accessors you can add validation and create forms just like any other property.

Querying your jsonb column

Now it’s time to use some query operators. PostgreSQL has many others, so read the documentation for a complete list.

Also, always remember to EXPLAIN the queries you’re executing; that way you can adapt your indexes for a better performance.

Users that opt-in newsletter

# preferences->newsletter = true
User.where('preferences @> ?', {newsletter: true}.to_json)

Users that are interested in Ruby

# preferences->interests = ['ruby', 'javascript', 'python']
User.where("preferences -> 'interests' ? :language", language: 'ruby')

This query won’t use the column index; if you need to query arrays, make sure you create an expression index.

CREATE INDEX preferences_interests_on_users ON users USING GIN ((preferences->'interests'))

Users that set Twitter and Github accounts

# preferences->twitter AND preferences->github
User.where('preferences ?& array[:keys]', keys: ['twitter', 'github'])

Users that set Twitter or Github account

# preferences->twitter OR preferences->github
User.where('preferences ?| array[:keys]', keys: ['twitter', 'github'])

Users that live in San Francisco/California

# preferences->state = 'SP' AND preferences->city = 'São Paulo'
User.where('preferences @> ?', {city: 'San Francisco', state: 'CA'}.to_json)

What about hstore?

hstore columns don’t allow a nested structure; they also store all values as strings, which will require type coercion on both database and application layers. With json/jsonb columns you don’t have this problem; numbers (integers/float), booleans, arrays, strings and nulls are accepted, and you can even nest structures in any way you want.

The recommendation is that you stop using hstore in favor of jsonb; just remember that you have to use PostgreSQL 9.4+ for this to work.

I wrote about hstore before, so check it out if you want to know more about it.

Wrapping up

PostgreSQL is such a powerful database, and fortunately ActiveRecord is keeping up with these updates, introducing built-in support for features like jsonb and hstore.

Some things can still be improved, like support for expression indexes. Changing ActiveRecord’s dumping to SQL it’s not a big deal, but this would be a bit easier.

# This doesn't exist, but it would be nice to have it!
add_index :users, "(settings->>'github')", raw: true

Using Rails and PostgreSQL is way easier than it used to be, and it tends to get even better on each new release, so try hard(er) to always be using the newest Rails release. It pays off really quickly.