Querying JSON with SQL is extremely powerful and convenient. Some great things about:
- Use SQL to query unstructured data
- Join relational and JSON tables
- Convert between JSON and relational schema
But query writing can be difficult and non obvious at first. Official documentation doesn’t have many samples. Many useful queries need other great but not widely known features of PosgresSQL like LATERAL joins and recursive queries.
This tutorial has some real world examples.
Get some data
Lets use GitHub Archive as source for large JSON with complex structure:
wget http://data.githubarchive.org/2015-01-01-15.json.gz gzip -d 2015-01-01-15.json.gz
Load JSON to PostgreSQL
COPY github FROM 'c:\temp\2015-01-01-15.json' WITH (format csv, quote e'\x01', delimiter e'\x02', escape e'\x01')
Query returned successfully: 11351 rows affected, 1025 ms execution time.
Do some simple queries
select js->>'type', count(*) from github group by 1;
select js->'actor'->'login' from github where js->>'type' = 'IssuesEvent';
Explore structure with LATERAL joins
Many Postgress functions return rowsets and are invoked like:
select * from jsonb_each(jsonb)
If your json is in many rows like in our GitHub sample then we need LATERAL joins:
Get top level keys from all: select key, max(length(value::text)), json_agg(distinct jsonb_typeof(value)) from github a, lateral jsonb_each(a.js) kv group by key
JSON is tree like and recursive queries are often necessary.
Here we enumerate all possible paths that exist in the documents:
with recursive tree(lvl, key, path, jstype) as ( select distinct 0 as lvl, kv.key as key, array[kv.key] as path, jsonb_typeof(value) as jstype from github a, lateral jsonb_each(a.js) kv union all select distinct tree.lvl + 1 as lvl, kv.key, array_append(tree.path, kv.key) as path, jsonb_typeof(value) as jstype from tree, github a, lateral jsonb_each(a.js #> tree.path) kv where jsonb_typeof(a.js #> tree.path) = 'object' ) select path, jstype from tree order by path;
Further reading and references
Others did tests showing that PostgresSQL is about 3x faster and files occupy about 3x less space than MongoDB.
Official operator and function reference: