Ler blog em Português

Using PostgreSQL and hstore with Rails

Read in 4 minutes

Rails 4+ introduced new features all over the place, but PostgreSQL support was one that has benefited the most, with many new column types.

One of these column types is called hstore, a column based on a key-value data structure. This feature is extremely useful when you have a data structure that can vary, but that needs to be queried eventually.

In Rails you can use serialized attributes, but this approach has a problem: you can’t query the stored value. This is not the case with hstore, since you can use functions and operators to query values and keys.

The downside of hstore is that all values are stored as strings. So if value type is important, you’ll have to coerce it to the proper type before using it.

About PostgreSQL support

hstore is supported by PostgreSQL 9.2+. If you’re using Ubuntu Trusty Tahr 14.04 LTS, you’re good to go. Older versions must use the PostgreSQL’s official Debian repository.

Using hstore

To enable hstore, you must issue the CREATE EXTENSION command.

CREATE EXTENSION IF NOT EXISTS hstore;

If you’re running Rails, you can use enable_extension. Let’s add a column called settings to the users table.

class AddPreferencesOnUsers < ActiveRecord::Migration
  def change
    enable_extension "hstore"
    add_column :users, :preferences, :hstore
  end
end

If you’re going to query this column, you must add an index. There are two types you can use: GiST and GIN.

You mileage may very, so make sure you benchmark everything before changing your database setup.

You can define the index on your migration file with the :using option.

class AddPreferencesToUsers < ActiveRecord::Migration
  def change
    enable_extension 'hstore'
    add_column :users, :preferences, :hstore
    add_index :users, :preferences, using: :gin
  end
end

Now you have to identify this column on your model with ActiveRecord::Store::ClassMethods#store_accessor.

class User < ActiveRecord::Base
  store_accessor :preferences
end

user = User.new
user.preferences = {
  github: "fnando",
  twitter: "fnando"
}

user.save!
user.reload

user.preferences[:twitter]
#=> fnando

As I said before, hstore will store all values as string. And if you provide a different type, it’ll be coerced before storing it to the database.

user.preferences[:newsletter] = false
user.save!
user.reload

user.preferences[:newsletter].class
#=> String

This can be a problem, but we have an easy solution for this.

Using the Virtus gem

Virtus is a gem that can define atributes with type coercion. So let’s create a class called UserPreferences, that will represent our user’s preferences.

class UserPreferences
  include Virtus.model

  attribute :github, String
  attribute :twitter, String
  attribute :newsletter, Boolean
end

Virtus will do all the boring work, including the UserPreferences#initialize method with mass-assignment support, which also performs type coercion.

preferences = UserPreferences.new(user.preferences)

preferences.newsletter.class
#=> FalseClass

To avoid having to explicitly call UserPreferences.new, we can define a custom serializer that will transparently convert the hstore value into a UserPreferences instance.

Instead of using ActiveRecord::Base.store_accessor we’ll use ActiveRecord::Base.serialize, defining UserPreferences as our serializer.

class User < ActiveRecord::Base
  serialize :preferences, UserPreferences
end

A serializer is any object that implements the methods dump and load.

class UserPreferences
  include Virtus.model

  attribute :github, String
  attribute :twitter, String
  attribute :newsletter, Boolean

  def self.dump(preferences)
    preferences.to_hash
  end

  def self.load(preferences)
    new(preferences)
  end
end

That was easy! Now let’s see if it’s working properly.

user = User.first

user.preferences.class
#=> UserPreferences

user.preferences.newsletter = true
user.save!
user.reload

user.preferences.newsletter.class
#=> TrueClass

What about querying the hstore column? That’s our next topic.

Querying hstore columns

Now we’ll explore some basic operators that you can use to query the hstore column. There are other operators you can use, so make sure you read the documentation.

To test indexes without having too many rows, disable sequential scan before running EXPLAIN.

SET enable_seqscan = FALSE;

Find all users that have the github key.

User.where("preferences ? :key", key: "github")

Find all users without the github key.

User.where("not preferences ? :key", key: "github")

Find an user with a specific Github account

User.where("preferences @> hstore(:key, :value)",
  key: "github", value: "fnando"
).first

Find all users that have a Twitter account with the substring %nan%

User.where("preferences -> :key LIKE :value",
  key: "twitter", value: "%nan%"
)

Find all users that opted-in for newsletter

User.where("preferences -> newsletter = :value", value: 'true')

Only the operators @>, ?, ?& and ?| will use the index. Avoid performing queries like the two last examples, since it can be really slow depending on the number of records you have. But if you have to query a hstore field, make sure you create an index like the following:

CREATE INDEX preferences_newsletter_key_on_users_index ON users ((preferences -> 'newsletter'));

What about JSON?

PostgreSQL also has JSON column types. If you’re already using the new PostgreSQL 9.4, create a JSONB column and you’ll benefit from GIN/GiST indexes, pretty much like hstore. That’s not the case for JSON columns, which is available on older versions.

Querying a JSON column will always perform a sequential scan, unless you create an expression index; otherwise PostgreSQL will fetch record by record until the condition is met, which can be a really slow operation.

Wrapping up

PostgreSQL’s hstore is a nice alternative for embedding documents without losing the benefits of a relational database.

The fact that all values are stored as strings can be a problem if you need to perform other operations, because you have to cast values beforehand on both database and application layers. We’ll see how to avoid this problem by using JSONB on the next article.