Using insensitive-case columns in PostgreSQL with citext
Read in 3 minutes
I’ve been using PostgreSQL instead of MySQL for a while now. When I first migrated, one problem I had was related to how string columns work. In PostgreSQL they’re case-sensitive.
To demonstrate how PostgreSQL works let’s create a table with a unique index.
-- create the table
CREATE TABLE users (name text NOT NULL, username text NOT NULL);
-- create a unique index
CREATE UNIQUE INDEX unique_username_on_users ON users (username);
When you insert some records, notice that PostgreSQL will accept “duplicated terms”.
INSERT INTO users (name, username) VALUES ('John Doe', 'john');
INSERT INTO users (name, username) VALUES ('Doe, John', 'JOHN');
If you select all records with a query like SELECT * FROM users
you can see that you have two records.
-----------+----------
name | username
-----------+----------
John Doe | john
Doe, John | JOHN
-----------+----------
(2 rows)
The unique_username_on_users
index will only be used for strings that has the same sequence of lowercase/uppercase characters.
INSERT INTO users (name, username) VALUES ('Doe, John', 'JOHN');
-- ERROR: duplicate key value violates unique constraint "unique_username_on_users"
-- DETAIL: Key (username)=(JOHN) already exists.
If you’re coming from MySQL, this is not what you expected. Fortunately PostgreSQL allows you to create indexes with expressions. You can change our index to have the same MySQL behavior.
-- remove all records
DELETE FROM users;
-- remove index
DROP INDEX unique_username_on_users;
-- create new index
CREATE UNIQUE INDEX unique_username_on_users ON users (lower(username));
Now, if you try to insert those same records, you’ll see that our index will prevent duplicated terms. For this to work, we defined our index with the lower(username)
expression.
INSERT INTO users (name, username) VALUES ('John Doe', 'john');
INSERT INTO users (name, username) VALUES ('Doe, John', 'JOHN');
-- ERROR: duplicate key value violates unique constraint "unique_username_on_users"
-- DETAIL: Key (lower(username))=(john) already exists.
But now we have another problem: the username
column is still case-sensitive. This means that your WHERE
clause will differentiate uppercase/lowercase characters.
SELECT * FROM users WHERE username = 'john';
-- ----------+----------
-- name | username
-- ----------+----------
-- John Doe | john
-- ----------+----------
-- (1 row)
SELECT * FROM users WHERE username = 'jOhN';
-- ------+----------
-- name | username
-- ------+----------
-- (0 rows)
So, to return the records that matches john
no matter the character case you’re using, you’ll have to use the lower
function. This is not elegant and it’s quite verbose, but will do the job.
SELECT * FROM users WHERE lower(username) = lower('jOhN');
-- ----------+----------
-- name | username
-- ----------+----------
-- John Doe | john
-- (1 row)
So, how can you fix this? PostgreSQL has an extension called citext.
The citext extension
The citext
extension allows you to define insensitive-case columns, so you can avoid things like the lower
function. In fact, this extension calls the lower
function automatically so that you don’t have to do it manually.
If you’re using Ubuntu, you have to install the postgresql-contrib
package.
$ sudo apt-get install postgresql-contrib
To enable citext
, just run something like this.
CREATE EXTENSION IF NOT EXISTS citext WITH SCHEMA public;
To list all enabled extensions on your system, run the command \dx
.
psql> \dx
List of installed extensions
Name | Version | Schema | Description
---------+---------+------------+--------------------------------------------------
citext | 1.0 | public | data type for case-insensitive character strings
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(2 rows)
Now you can change the column type to citext
.
-- alter username column type
ALTER TABLE users ALTER COLUMN username TYPE citext;
-- remove index
DROP INDEX unique_username_on_users;
-- recreate index without the lower call
CREATE UNIQUE INDEX unique_username_on_users ON users (username);
As you can see, everything works as expected.
SELECT * FROM users WHERE username = 'jOhN';
--- ----------+----------
--- name | username
--- ----------+----------
--- John Doe | john
--- (1 row)
INSERT INTO users (name, username) VALUES ('JD', 'jOhN');
-- ERROR: duplicate key value violates unique constraint "unique_username_on_users"
-- DETAIL: Key (username)=(jOhN) already exists.
Using citext on your Rails application
Rails 4.2+ now has support for citext
. Just remember to activate the citext
extension using the enable_extension
method.
class CreateUsers < ActiveRecord::Migration
def change
enable_extension 'citext'
create_table :users do |t|
t.text :name, null: false
t.citext :username, null: false
end
add_index :users, :username, unique: true
end
end
And you’re done! Now your username
column is case-insensitive and you don’t have to worry with anything else.
But if you’re running an older version, you must use the activerecord-postgresql-citext gem. Just add it to your Gemfile
and run bundle install
.
source 'https://rubygems.org'
gem 'rails', '4.1.9'
gem 'pg'
gem 'activerecord-postgresql-citext'
Now you can use the citext
column type just like we did before.
Wrapping up
Not every tool works the same, so it’s really important to understand the changes introduced by a new software you’re going to use. PostgreSQL is awesome and has some many other useful features that I’ll try to cover in the future.
This text is a translation from an article originally published in Dec 29, 2013.