Querying JSON with SQL is extremely powerful and convenient. Some great things about:
- Use SQL to query unstructured data
- Join relational and JSON tables
- Convert between JSON and relational schema
But query writing can be difficult and non obvious at first. Official documentation doesn’t have many samples. Many useful queries need other great but not widely known features of PosgresSQL like LATERAL joins and recursive queries.
This tutorial has some real world examples.
Get some data
Lets use GitHub Archive as source for large JSON with complex structure:
wget http://data.githubarchive.org/2015-01-01-15.json.gz gzip -d 2015-01-01-15.json.gz
Load JSON to PostgreSQL
COPY github FROM 'c:\temp\2015-01-01-15.json' WITH (format csv, quote e'\x01', delimiter e'\x02', escape e'\x01')
Query returned successfully: 11351 rows affected, 1025 ms execution time.
Do some simple queries
select js->>'type', count(*) from github group by 1;
select js->'actor'->'login' from github where js->>'type' = 'IssuesEvent';
Explore structure with LATERAL joins
Many Postgress functions return rowsets and are invoked like:
select * from jsonb_each(jsonb)
If your json is in many rows like in our GitHub sample then we need LATERAL joins:
Get top level keys from all: select key, max(length(value::text)), json_agg(distinct jsonb_typeof(value)) from github a, lateral jsonb_each(a.js) kv group by key
JSON is tree like and recursive queries are often necessary.
Here we enumerate all possible paths that exist in the documents:
with recursive tree(lvl, key, path, jstype) as ( select distinct 0 as lvl, kv.key as key, array