Using PostgreSQL Jsonb columns in Ecto

Postgres introduced recently, in 9.4, jsonb column type, additional to existing json column that already existed before. JSON colums is no more than a blob field where the json is dumped as string. JSONB on the other fact is stored in a custom format optimized for json operations alowing for more operators. See stackoverflow thread for more information.

I created a github repo to showcase how to use it in Ecto. To see it in action clone the repo then create and populate database with 1M records:

mix ecto.create
mix ecto.migrate
mix run priv/repo/seeds.exs

I used jsonb to serialize a map in settings field and an array in acls field. Also to showcase that we can create also indexes on this fields I added settings_index and acls_index fields.

In settings field I stored objects that look like this:

{
  "roles": ["admin", "guest", "config"],
  "loyalties": 200,
  "providers": ["google", "facebook", "twitter"]
}

Now that we have the database ready we can run queries on it inside the jsonb field:

String exist as array element:

query = from u in User,
        where: fragment("(settings_index->'roles')::jsonb \\? ?", "admin")

result = Repo.all(query)

Array contains value:

We’ll use array intersection here.

query = from u in User,
        where: fragment("acls_index @> ?", ^[101])

roles_result = Repo.all(query)

Object contains another object:

query = from u in User,
        where: fragment("(settings_index)::jsonb @> ?::jsonb", %{loyalties: 100})

result = Repo.all(query)

Object key exists:

query = from u in User,
        where: fragment("(settings_index)::jsonb \\? ?", "personal_info")

result = Repo.all(query)

Index performance

Also I tested how fast the queries work with and without index.

IO.puts "Roles search benchmark with index vs without index"
index_role_query = fn ->
  query = from u in User,
          where: fragment("(settings_index->'roles')::jsonb \\? ?", "admin")
  Repo.all(query)
end

no_index_role_query = fn ->
  query = from u in User,
          where: fragment("(settings->'roles')::jsonb \\? ?", "admin")
  Repo.all(query)
end

Benchee.run(%{
  "with index"    => index_role_query,
  "without index" => no_index_role_query
}, time: 10)

This are my results:

Benchmarking with index...
Benchmarking without index...

Name                    ips        average  deviation         median         99th %
with index           2.94 K        0.34 ms    ±29.07%        0.33 ms        0.51 ms
without index     0.00734 K      136.22 ms     ±2.00%      135.63 ms      147.23 ms

Comparison:
with index           2.94 K
without index     0.00734 K - 401.06x slower

As you can see there is significant performance increase if we add proper indexes.