Web Scraping Service

PostgreSQL as a Service options comparison and benchmark

Background

PostgresSQL is great. But administering it can suck up a lot of time and for small teams using SaaS service is great value. We use and love Amazon RDS.  Until recently it was the only reasonable choice in the market. But in 2017 new options ore on the verge of becoming available.  Both Google and Azure clouds announced support and Amazon is also launching their Aurora service with PostgreSQL compatibility.

We did a quick comparison of those options.

TLDR:  Google and Aurora are a bit faster for the same money but it’s no free lunch. Azure tests are not yet done.

The Test

Use case we care about is “mid size” database and queries that process gigabytes of data in seconds-minutes. More specifically we test 45GB table containing JSON documents (PostgreSQL is great for working with JSON).   

We won’t test CRUD/web app backend performance. pg_bench is fine for that. In reality for small/medium apps any service will be fine and cost negligible.

We won’t test “big data” scenarios – Google BigQuery, AWS Redshift and Athena are great for that.

To run this test we provisioned High Availability setups with 16GB RAM and minimum number of possible CPUs.  We also ran tests on a desktop machine with a modern SSD.

Instance sizes and costs

All amounts are US dollars per month, using on demand pricing.

AWS RDSAWS AuroraGoogle
NotesPostgreSQL pricing is unavailable at this time. So we are using MySQL pricing:High availability for PostgreSQL is not yet available.  Assuming 2x costs
Cost itemsdb.r3.large multi AZ (2 cpu, 15GB ram) $376
1TB General Purpose (SSD) Storage $230
db.r3.large $0.290 $215
1TB storage   $.100/GB $100
CPU Master 2 vCPU $94
CPU Slave 2 vCPU $94
RAM Master 13GB $103
RAM Slave 13GB $103
SSD Storage Maser 1TB $187
SSD Storage Slave 1TB $187
Total$606$315$758


Results

First times to restore and parse 45GB table containing JSON.

AWS RDSAWS AuroraGoogle Desktop
Restore (seq write) 50:28 41:4421:1524:00
Scan (seq read) 32:305:0215:582:57
Parse JSON (CPU)1:46:5046:5451:5531:51

 

Dividing time by size we get processing speed in MB/sec:

AWS RDSAWS AuroraGoogle PGDesktop
Restore (seq write)14.5317.5734.5030.55
Scan (seq read)22.56145.6945.92248.58
Parse JSON (CPU)6.8615.6414.1223.02

 

Observations

JSON parsing is CPU bound and only runs in one thread so this is a tight bottleneck on all platforms. Still seems that Google gives faster virtual cpus than RDS.

Writing to Aurora is slow and reading is fast.  Could be because of their 6 way replication.

Google claims to give many more IOPS than Amazon RDS (15000 vs 3000).  Our test shows only about 2x speedup.

As of June 2017 Google is the best value, but no HA support yet.  Aurora is not yet ready but might mature to be the best option.  RDS is still a winner for production work.

SQL queries used

Sequential scan:

select count(*) row_count, sum(length(content)) text_length from js

JSON parse:

with jp as (select json_array_elements(content::json) x from js) select distinct x->>'ID' id, x->>'Name' name from  jp
Exit mobile version