Querying JSON with SQL is extremely powerful and convenient.  Some great things about:

  • Use SQL to query unstructured data
  • Join relational and JSON tables
  • Convert between JSON and relational schema

But query writing can be difficult and non obvious at first.  Official documentation doesn’t have many samples. Many useful queries need other great but not widely known features of PosgresSQL like LATERAL joins and recursive queries.

This tutorial has some real world examples.

Get some data

Lets use GitHub Archive as source for large JSON with complex structure:

gzip -d 2015-01-01-15.json.gz

Load JSON to PostgreSQL

Super easy:

COPY github FROM 'c:\temp\2015-01-01-15.json'
WITH (format csv, quote e'\x01', delimiter e'\x02', escape e'\x01')

Query returned successfully: 11351 rows affected, 1025 ms execution time.

Do some simple queries

Still straightforward:

select js->>'type', count(*) from github group by 1;
select js->'actor'->'login' from github where js->>'type' = 'IssuesEvent';

Explore structure with LATERAL joins

Many Postgress functions return rowsets and are invoked like:

select * from jsonb_each(jsonb)

If your json is in many rows like in our GitHub sample then we need LATERAL joins:

Get top level keys from all:
 json_agg(distinct jsonb_typeof(value))
 from github a, lateral jsonb_each(a.js) kv
 group by key

Recursive queries

JSON is tree like and recursive queries are often necessary.
Here we enumerate all possible paths that exist in the documents:

with recursive tree(lvl, key, path, jstype) as (
select distinct 
 0 as lvl,
 kv.key as key, 

[kv.key] as path, jsonb_typeof(value) as jstype from github a, lateral jsonb_each(a.js) kv union all select distinct tree.lvl + 1 as lvl, kv.key, array_append(tree.path, kv.key) as path, jsonb_typeof(value) as jstype from tree, github a, lateral jsonb_each(a.js #> tree.path) kv where jsonb_typeof(a.js #> tree.path) = 'object' ) select path, jstype from tree order by path;

Further reading and references

Others did tests showing that PostgresSQL is about 3x faster and files occupy about 3x less space than MongoDB.

Official operator and function reference: