PostgreSQL

Usage of PostgreSQL for Web Scraping and datasets management.

27 Jun, 2017

PostgreSQL as a Service options comparison and benchmark

By |2019-03-04T15:20:50+02:00June 27th, 2017|PostgreSQL|0 Comments

Background

PostgresSQL is great. But administering it can suck up a lot of time and for small teams using SaaS service is great value. We use and love Amazon RDS.  Until recently it was the only reasonable choice in the market. But in 2017 new options ore on the verge of becoming available.  Both Google and Azure clouds announced support and Amazon is also launching their Aurora service with PostgreSQL compatibility.

We did a quick comparison of those options.

TLDR:  Google and Aurora are a bit faster for the same money but it’s no free lunch. Azure tests are not yet done.

The Test

Use case we care about is “mid size” database and queries […]

14 Oct, 2016

Migrating PostgreSQL Databases From AWS RDS To Standalone

By |2019-03-04T15:50:24+02:00October 14th, 2016|PostgreSQL|0 Comments

Intro

AWS RDS is very convenient and takes care of almost all DBA tasks. It just works as long as you stay inside AWS. But if you want to have a local copy of your database or need to move data to another host it can be tricky.

TL;DR – For our solution skip to part last section

What doesn’t work

AWS daily backups

AWS RDS by default creates daily backups of your data. First thought would be get such backup and restore it locally. But those are not regular Postgres backups. They probably are VM image copies, but no way to know, as you cannot copy or see them.  The only option is to restore them to another RDS instance.

Postgresql replication

AWS uses replication to maintain […]

17 Feb, 2015

PostgreSQL 9.4 JSON Queries

By |2019-03-04T16:19:29+02:00February 17th, 2015|PostgreSQL|0 Comments

Intro

Querying JSON with SQL is extremely powerful and convenient.  Some great things about:

  • Use SQL to query unstructured data
  • Join relational and JSON tables
  • Convert between JSON and relational schema

But query writing can be difficult and non obvious at first.  Official documentation doesn’t have many samples. Many useful queries need other great but not widely known features of PosgresSQL like LATERAL joins and recursive queries.

This tutorial has some real world examples.

Get some data

Lets use GitHub Archive as source for large JSON with complex structure:

wget http://data.githubarchive.org/2015-01-01-15.json.gz
gzip -d 2015-01-01-15.json.gz

Load JSON to PostgreSQL

Super easy:

COPY github FROM 'c:\temp\2015-01-01-15.json'
WITH (format csv, quote e'\x01', delimiter e'\x02', escape e'\x01')

Query returned successfully: 11351 rows affected, 1025 ms execution time.

Do some simple queries

Still straightforward:

select js->>'type', count(*) from github group by [...]