Using PostgreSQL and hstore with Rails
Read in 4 minutes
Rails 4+ introduced new features all over the place, but PostgreSQL support was one that has benefited the most, with many new column types.
One of these column types is called hstore, a column based on a key-value data structure. This feature is extremely useful when you have a data structure that can vary, but that needs to be queried eventually.
In Rails you can use serialized attributes, but this approach has a problem: you can’t query the stored value. This is not the case with hstore, since you can use functions and operators to query values and keys.
The downside of hstore is that all values are stored as strings. So if value type is important, you’ll have to coerce it to the proper type before using it.
About PostgreSQL support
hstore is supported by PostgreSQL 9.2+. If you’re using Ubuntu Trusty Tahr 14.04 LTS, you’re good to go. Older versions must use the PostgreSQL’s official Debian repository.
Using hstore
To enable hstore, you must issue the CREATE EXTENSION
command.
CREATE EXTENSION IF NOT EXISTS hstore;
If you’re running Rails, you can use enable_extension
. Let’s add a column called settings
to the users
table.
class AddPreferencesOnUsers < ActiveRecord::Migration
def change
enable_extension "hstore"
add_column :users, :preferences, :hstore
end
end
If you’re going to query this column, you must add an index. There are two types you can use: GiST and GIN.
- GIN indexes are three types faster to search, but they take more time to index. They also take more disk space. Use it when you have more than 100K unique terms.
- GiST indexes are slower than GIN indexes, but they’re faster to update. Use it when you have up to 100K unique terms.
You mileage may very, so make sure you benchmark everything before changing your database setup.
You can define the index on your migration file with the :using
option.
class AddPreferencesToUsers < ActiveRecord::Migration
def change
enable_extension 'hstore'
add_column :users, :preferences, :hstore
add_index :users, :preferences, using: :gin
end
end
Now you have to identify this column on your model with ActiveRecord::Store::ClassMethods#store_accessor
.
class User < ActiveRecord::Base
store_accessor :preferences
end
user = User.new
user.preferences = {
github: "fnando",
twitter: "fnando"
}
user.save!
user.reload
user.preferences[:twitter]
#=> fnando
As I said before, hstore will store all values as string. And if you provide a different type, it’ll be coerced before storing it to the database.
user.preferences[:newsletter] = false
user.save!
user.reload
user.preferences[:newsletter].class
#=> String
This can be a problem, but we have an easy solution for this.
Using the Virtus gem
Virtus is a gem that can define atributes with type coercion. So let’s create a class called UserPreferences
, that will represent our user’s preferences.
class UserPreferences
include Virtus.model
attribute :github, String
attribute :twitter, String
attribute :newsletter, Boolean
end
Virtus will do all the boring work, including the UserPreferences#initialize
method with mass-assignment support, which also performs type coercion.
preferences = UserPreferences.new(user.preferences)
preferences.newsletter.class
#=> FalseClass
To avoid having to explicitly call UserPreferences.new
, we can define a custom serializer that will transparently convert the hstore value into a UserPreferences
instance.
Instead of using ActiveRecord::Base.store_accessor
we’ll use ActiveRecord::Base.serialize
, defining UserPreferences
as our serializer.
class User < ActiveRecord::Base
serialize :preferences, UserPreferences
end
A serializer is any object that implements the methods dump
and load
.
dump
will be called whenever the object is going to be persisted on the database.load
will receive the hstore value (a Hash) and must return the coerced value.
class UserPreferences
include Virtus.model
attribute :github, String
attribute :twitter, String
attribute :newsletter, Boolean
def self.dump(preferences)
preferences.to_hash
end
def self.load(preferences)
new(preferences)
end
end
That was easy! Now let’s see if it’s working properly.
user = User.first
user.preferences.class
#=> UserPreferences
user.preferences.newsletter = true
user.save!
user.reload
user.preferences.newsletter.class
#=> TrueClass
What about querying the hstore column? That’s our next topic.
Querying hstore columns
Now we’ll explore some basic operators that you can use to query the hstore column. There are other operators you can use, so make sure you read the documentation.
To test indexes without having too many rows, disable sequential scan before running EXPLAIN
.
SET enable_seqscan = FALSE;
Find all users that have the github
key.
User.where("preferences ? :key", key: "github")
Find all users without the github
key.
User.where("not preferences ? :key", key: "github")
Find an user with a specific Github account
User.where("preferences @> hstore(:key, :value)",
key: "github", value: "fnando"
).first
Find all users that have a Twitter account with the substring %nan%
User.where("preferences -> :key LIKE :value",
key: "twitter", value: "%nan%"
)
Find all users that opted-in for newsletter
User.where("preferences -> newsletter = :value", value: 'true')
Only the operators @>
, ?
, ?&
and ?|
will use the index. Avoid performing queries like the two last examples, since it can be really slow depending on the number of records you have. But if you have to query a hstore field, make sure you create an index like the following:
CREATE INDEX preferences_newsletter_key_on_users_index ON users ((preferences -> 'newsletter'));
What about JSON?
PostgreSQL also has JSON column types. If you’re already using the new PostgreSQL 9.4, create a JSONB column and you’ll benefit from GIN/GiST indexes, pretty much like hstore. That’s not the case for JSON columns, which is available on older versions.
Querying a JSON column will always perform a sequential scan, unless you create an expression index; otherwise PostgreSQL will fetch record by record until the condition is met, which can be a really slow operation.
Wrapping up
PostgreSQL’s hstore is a nice alternative for embedding documents without losing the benefits of a relational database.
The fact that all values are stored as strings can be a problem if you need to perform other operations, because you have to cast values beforehand on both database and application layers. We’ll see how to avoid this problem by using JSONB on the next article.