Using PostgreSQL and jsonb with Ruby on Rails
Read in 8 minutes
PostgreSQL 9.4 introduced jsonb
, a new column type for storing documents in your relational database. jsonb
and json
columns look exactly the same on a higher level, but differs on the storage implementation.
The advantage of using jsonb
is that you can easily integrate relational and non-relation data, with performance that can be better than most non-relational databases like MongoDB.
Understanding the differences between json and jsonb
So, what are the differences between both column types? When you compare writing speed, jsonb
will be slightly slower, due to the way data is stored.
json
stores an exact copy of the text input, which must be reparsed again and again when you use any processing function. It doesn’t support indexes, but you can create an expression index for querying.jsonb
stores a binary representation that avoids reparsing the data structure. It supports indexing, which means you can query any path without a specific index.
Another difference is that json
columns will reparse the stored value every time; that means that key ordering will be same from the input. This doesn’t happen on jsonb
columns, as content is stored in binary format and no key ordering is guaranteed. So if you have software that relies on key ordering (really?) jsonb
may not be the best choice for your application.
Let’s run a small benchmark. In this case, I used the following JSON structure:
{
"twitter": "johndoe1",
"github": "johndoe1",
"bio": "Lorem ipsum dolor sit amet, consectetur adipisicing elit. Labore impedit aliquam sapiente dolore magni aliquid ipsa ad, enim, esse ut reprehenderit quaerat deleniti fugit eaque. Vero eligendi voluptatibus atque, asperiores.",
"blog": "http://johndoe1.example.com",
"interests": [
"music",
"movies",
"programming"
],
"age": 42,
"newsletter": true
}
Inserting 30,000 records didn’t have any real difference, so I believe that jsonb
can be slower in complex structures.
Rehearsal ------------------------------------------------
insert jsonb 2.690000 0.590000 3.280000 ( 12.572343)
insert json 2.690000 0.590000 3.280000 ( 12.766534)
--------------------------------------- total: 6.560000sec
user system total real
insert jsonb 2.680000 0.590000 3.270000 ( 13.206602)
insert json 2.650000 0.580000 3.230000 ( 12.577138)
The real difference is when you query a json
/jsonb
column. First, let’s see what the table and indexes look like.
CREATE TABLE users (
id serial not null,
settings jsonb not null default '{}',
preferences json not null default '{}'
);
CREATE INDEX settings_index ON users USING gin (settings);
CREATE INDEX twitter_settings_index ON users ((settings->>'github'));
CREATE INDEX preferences_index ON users ((preferences->>'github'));
Notice that we have a GIN index on the settings
column, and two expression index on a given path (github
). With 30,000 records, searching for the Github username john30000
(the last inserted record), will give us the following numbers:
Rehearsal -----------------------------------------------------------------
read jsonb (index column) 0.030000 0.030000 0.060000 ( 3.673465)
read jsonb (expression index) 0.010000 0.010000 0.020000 ( 0.087105)
read json (expression index) 0.010000 0.020000 0.030000 ( 0.080121)
read json (no index) 0.060000 0.030000 0.090000 (113.206747)
-------------------------------------------------------- total: 0.200000sec
user system total real
read jsonb (index column) 0.010000 0.020000 0.030000 ( 0.092476)
read jsonb (expression index) 0.010000 0.010000 0.020000 ( 0.078916)
read json (expression index) 0.010000 0.010000 0.020000 ( 0.081908)
read json (no index) 0.050000 0.040000 0.090000 (110.761944)
As you can see, expression indexes have pretty much the same performance, so there’s no real gain here. The difference comes when you try to query a path that have no index; while the jsonb
can have a GIN/GiST index on the whole column, that’s not possible with json
columns. That’s why it’s so much more slower.
Let’s check the query planner for reading on path without indexes.
EXPLAIN
SELECT *
FROM users
WHERE settings @> '{"twitter": "john30000"}' LIMIT 1;
-- QUERY PLAN
-- -------------------------------------------------------------------------------------
-- Limit (cost=28.23..31.96 rows=1 width=468)
-- -> Bitmap Heap Scan on users (cost=28.23..140.07 rows=30 width=468)
-- Recheck Cond: (settings @> '{"twitter": "john30000"}'::jsonb)
-- -> Bitmap Index Scan on settings_index (cost=0.00..28.23 rows=30 width=0)
-- Index Cond: (settings @> '{"twitter": "john30000"}'::jsonb)
EXPLAIN
SELECT *
FROM users
WHERE preferences->>'twitter' = 'john30000' LIMIT 1;
-- QUERY PLAN
-- -------------------------------------------------------------------------
-- Limit (cost=0.00..25.23 rows=1 width=468)
-- -> Seq Scan on users (cost=0.00..3784.00 rows=150 width=468)
-- Filter: ((preferences ->> 'twitter'::text) = 'john30000'::text)
The most important thing here is that json
performed a sequential scan, which means that PostgreSQL goes record by record, sequentially, until the condition is satisfied; also remember that the JSON content is going to be reparsed for each record, which can be slow for complex structures.
That doesn’t happen on a jsonb
column. The search is not as optimized as it would be with an expression index on the path you’re querying, but still uses the index.
One thing that you have to keep in mind is that jsonb
will always perform a sequential scan if you use the ->>
operator on a path that have no expression index.
EXPLAIN
SELECT *
FROM users
WHERE settings->>'twitter' = 'johndoe30000' LIMIT 1;
-- QUERY PLAN
-- -------------------------------------------------------------------------
-- Limit (cost=0.00..25.23 rows=1 width=468)
-- -> Seq Scan on users (cost=0.00..3784.00 rows=150 width=468)
-- Filter: ((settings ->> 'twitter'::text) = 'johndoe30000'::text)
-- (3 rows)
So if you don’t know beforehand which keys you’ll query, or if you can query any path, make sure you define an GIN/GiST index and use the @>
(or any other operator that benefits from that index).
Converting json into jsonb
If you’re already using json
or a text
column for storing JSON documents, you can convert it to jsonb
, so you can rely on column indexes.
BEGIN;
ALTER TABLE users ADD COLUMN preferences_jsonb jsonb DEFAULT '{}';
UPDATE users set preferences_jsonb = preferences::jsonb;
ALTER TABLE users ALTER COLUMN preferences_jsonb SET NOT NULL;
ALTER TABLE users RENAME COLUMN preferences TO preferences_json;
ALTER TABLE users RENAME COLUMN preferences_jsonb TO preferences;
-- Don't remove the column until you're sure everything is working.
-- ALTER TABLE users DROP COLUMN preferences_json;
COMMIT;
Now that you know how json
works, let’s see how you can use it with Ruby on Rails
Using jsonb with Ruby on Rails
Rails introduced jsonb
support in v4.2. Is as simple as using a string
or text
column. You can see below how to add a jsonb
column to a new table and an existing one.
# db/migrate/*_create_users.rb
class CreateUsers < ActiveRecord::Migration
def change
enable_extension 'citext'
create_table :users do |t|
t.text :name, null: false
t.citext :username, null: false
t.jsonb :preferences, null: false, default: '{}'
end
add_index :users, :preferences, using: :gin
end
end
# db/migrate/*_add_jsonb_column_to_users.rb
class AddJsonbColumnToUsers < ActiveRecord::Migration
def change
add_column :users, :preferences, :jsonb, null: false, default: '{}'
add_index :users, :preferences, using: :gin
end
end
Notice that we’re also defining a GIN
index; if you want to create an expression index for a given path, you’ll have to use execute
. In this case, Rails won’t know how to dump your index using Ruby, so you’re better off changing the dump format to SQL.
# config/initializers/active_record.rb
Rails.application.config.active_record.schema_format = :sql
# db/migrate/*_add_index_to_preferences_path_on_users.rb
class AddIndexToPreferencesPathOnUsers < ActiveRecord::Migration
def change
execute <<-SQL
CREATE INDEX user_prefs_newsletter_index ON users ((preferences->>'newsletter'))
SQL
end
end
Your model won’t need any configuration. Just create your records providing an object that supports JSON serialization.
user = User.create!({
name: 'John Doe',
username: 'johndoe',
preferences: {
twitter: 'johndoe',
github: 'johndoe',
blog: 'http://example.com'
}
})
# Reload record from database to enforce serialization.
user.reload
# Show preferences.
user.preferences
#=> {"blog"=>"http://example.com", "github"=>"johndoe", "twitter"=>"johndoe"}
# Get blog.
user.preferences['blog']
#=> http://example.com
You can see that all keys are returned as strings. You can use a custom serializer so you can access the JSON object with symbols as well.
# app/models/user.rb
class User < ActiveRecord::Base
serialize :preferences, HashSerializer
end
# app/serializers/hash_serializer.rb
class HashSerializer
def self.dump(hash)
hash.to_json
end
def self.load(hash)
(hash || {}).with_indifferent_access
end
end
Another interesting ActiveRecord feature is store_accessor
. If you see yourself setting some attributes frequently, you can create accessors, so you can just assign the property, instead of passing through the JSON. This also makes easier to validate properties and create forms. So, if we want to create a form for storing a blog url, Github and Twitter accounts, you can create something like this:
class User < ActiveRecord::Base
serialize :preferences, HashSerializer
store_accessor :preferences, :blog, :github, :twitter
end
Now you can simply assign these properties.
user = User.new(blog: 'http://example.org', github: 'johndoe')
user.preferences
#=> {"blog"=>"http://example.org", "github"=>"johndoe"}
user.blog
#=> http://example.org
user.preferences[:github]
#=> johndoe
user.preferences['github']
#=> johndoe
With store accessors you can add validation and create forms just like any other property.
Querying your jsonb column
Now it’s time to use some query operators. PostgreSQL has many others, so read the documentation for a complete list.
Also, always remember to EXPLAIN
the queries you’re executing; that way you can adapt your indexes for a better performance.
Users that opt-in newsletter
# preferences->newsletter = true
User.where('preferences @> ?', {newsletter: true}.to_json)
Users that are interested in Ruby
# preferences->interests = ['ruby', 'javascript', 'python']
User.where("preferences -> 'interests' ? :language", language: 'ruby')
This query won’t use the column index; if you need to query arrays, make sure you create an expression index.
CREATE INDEX preferences_interests_on_users ON users USING GIN ((preferences->'interests'))
Users that set Twitter and Github accounts
# preferences->twitter AND preferences->github
User.where('preferences ?& array[:keys]', keys: ['twitter', 'github'])
Users that set Twitter or Github account
# preferences->twitter OR preferences->github
User.where('preferences ?| array[:keys]', keys: ['twitter', 'github'])
Users that live in San Francisco/California
# preferences->state = 'SP' AND preferences->city = 'São Paulo'
User.where('preferences @> ?', {city: 'San Francisco', state: 'CA'}.to_json)
What about hstore?
hstore
columns don’t allow a nested structure; they also store all values as strings, which will require type coercion on both database and application layers. With json
/jsonb
columns you don’t have this problem; numbers (integers/float), booleans, arrays, strings and nulls are accepted, and you can even nest structures in any way you want.
The recommendation is that you stop using hstore
in favor of jsonb
; just remember that you have to use PostgreSQL 9.4+ for this to work.
I wrote about hstore before, so check it out if you want to know more about it.
Wrapping up
PostgreSQL is such a powerful database, and fortunately ActiveRecord is keeping up with these updates, introducing built-in support for features like jsonb
and hstore
.
Some things can still be improved, like support for expression indexes. Changing ActiveRecord’s dumping to SQL it’s not a big deal, but this would be a bit easier.
# This doesn't exist, but it would be nice to have it!
add_index :users, "(settings->>'github')", raw: true
Using Rails and PostgreSQL is way easier than it used to be, and it tends to get even better on each new release, so try hard(er) to always be using the newest Rails release. It pays off really quickly.