Ler blog em Português

Using insensitive-case columns in PostgreSQL with citext

Read in 3 minutes

I’ve been using PostgreSQL instead of MySQL for a while now. When I first migrated, one problem I had was related to how string columns work. In PostgreSQL they’re case-sensitive.

To demonstrate how PostgreSQL works let’s create a table with a unique index.

-- create the table
CREATE TABLE users (name text NOT NULL, username text NOT NULL);

-- create a unique index
CREATE UNIQUE INDEX unique_username_on_users ON users (username);

When you insert some records, notice that PostgreSQL will accept “duplicated terms”.

INSERT INTO users (name, username) VALUES ('John Doe', 'john');
INSERT INTO users (name, username) VALUES ('Doe, John', 'JOHN');

If you select all records with a query like SELECT * FROM users you can see that you have two records.

-----------+----------
   name    | username
-----------+----------
 John Doe  | john
 Doe, John | JOHN
-----------+----------
(2 rows)

The unique_username_on_users index will only be used for strings that has the same sequence of lowercase/uppercase characters.

INSERT INTO users (name, username) VALUES ('Doe, John', 'JOHN');
-- ERROR:  duplicate key value violates unique constraint "unique_username_on_users"
-- DETAIL:  Key (username)=(JOHN) already exists.

If you’re coming from MySQL, this is not what you expected. Fortunately PostgreSQL allows you to create indexes with expressions. You can change our index to have the same MySQL behavior.

-- remove all records
DELETE FROM users;

-- remove index
DROP INDEX unique_username_on_users;

-- create new index
CREATE UNIQUE INDEX unique_username_on_users ON users (lower(username));

Now, if you try to insert those same records, you’ll see that our index will prevent duplicated terms. For this to work, we defined our index with the lower(username) expression.

INSERT INTO users (name, username) VALUES ('John Doe', 'john');

INSERT INTO users (name, username) VALUES ('Doe, John', 'JOHN');
-- ERROR:  duplicate key value violates unique constraint "unique_username_on_users"
-- DETAIL:  Key (lower(username))=(john) already exists.

But now we have another problem: the username column is still case-sensitive. This means that your WHERE clause will differentiate uppercase/lowercase characters.

SELECT * FROM users WHERE username = 'john';

-- ----------+----------
--    name   | username
-- ----------+----------
--  John Doe | john
-- ----------+----------
-- (1 row)

SELECT * FROM users WHERE username = 'jOhN';

-- ------+----------
--  name | username
-- ------+----------
-- (0 rows)

So, to return the records that matches john no matter the character case you’re using, you’ll have to use the lower function. This is not elegant and it’s quite verbose, but will do the job.

SELECT * FROM users WHERE lower(username) = lower('jOhN');

-- ----------+----------
--    name   | username
-- ----------+----------
--  John Doe | john
-- (1 row)

So, how can you fix this? PostgreSQL has an extension called citext.

The citext extension

The citext extension allows you to define insensitive-case columns, so you can avoid things like the lower function. In fact, this extension calls the lower function automatically so that you don’t have to do it manually.

If you’re using Ubuntu, you have to install the postgresql-contrib package.

$ sudo apt-get install postgresql-contrib

To enable citext, just run something like this.

CREATE EXTENSION IF NOT EXISTS citext WITH SCHEMA public;

To list all enabled extensions on your system, run the command \dx.

psql> \dx
                           List of installed extensions
  Name   | Version |   Schema   |                   Description
---------+---------+------------+--------------------------------------------------
 citext  | 1.0     | public     | data type for case-insensitive character strings
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)

Now you can change the column type to citext.

-- alter username column type
ALTER TABLE users ALTER COLUMN username TYPE citext;

-- remove index
DROP INDEX unique_username_on_users;

-- recreate index without the lower call
CREATE UNIQUE INDEX unique_username_on_users ON users (username);

As you can see, everything works as expected.

SELECT * FROM users WHERE username = 'jOhN';

--- ----------+----------
---    name   | username
--- ----------+----------
---  John Doe | john
--- (1 row)

INSERT INTO users (name, username) VALUES ('JD', 'jOhN');
-- ERROR:  duplicate key value violates unique constraint "unique_username_on_users"
-- DETAIL:  Key (username)=(jOhN) already exists.

Using citext on your Rails application

Rails 4.2+ now has support for citext. Just remember to activate the citext extension using the enable_extension method.

class CreateUsers < ActiveRecord::Migration
  def change
    enable_extension 'citext'

    create_table :users do |t|
      t.text :name, null: false
      t.citext :username, null: false
    end

    add_index :users, :username, unique: true
  end
end

And you’re done! Now your username column is case-insensitive and you don’t have to worry with anything else.

But if you’re running an older version, you must use the activerecord-postgresql-citext gem. Just add it to your Gemfile and run bundle install.

source 'https://rubygems.org'
gem 'rails', '4.1.9'
gem 'pg'
gem 'activerecord-postgresql-citext'

Now you can use the citext column type just like we did before.

Wrapping up

Not every tool works the same, so it’s really important to understand the changes introduced by a new software you’re going to use. PostgreSQL is awesome and has some many other useful features that I’ll try to cover in the future.

This text is a translation from an article originally published in Dec 29, 2013.