Background
PostgresSQL is great. But administering it can suck up a lot of time and for small teams using SaaS service is great value. We use and love Amazon RDS. Until recently it was the only reasonable choice in the market. But in 2017 new options ore on the verge of becoming available. Both Google and Azure clouds announced support and Amazon is also launching their Aurora service with PostgreSQL compatibility.
We did a quick comparison of those options.
TLDR: Google and Aurora are a bit faster for the same money but it’s no free lunch. Azure tests are not yet done.
The Test
Use case we care about is “mid size” database and queries that process gigabytes of data in seconds-minutes. More specifically we test 45GB table containing JSON documents (PostgreSQL is great for working with JSON).
We won’t test CRUD/web app backend performance. pg_bench is fine for that. In reality for small/medium apps any service will be fine and cost negligible.
We won’t test “big data” scenarios – Google BigQuery, AWS Redshift and Athena are great for that.
To run this test we provisioned High Availability setups with 16GB RAM and minimum number of possible CPUs. We also ran tests on a desktop machine with a modern SSD.
Instance sizes and costs
All amounts are US dollars per month, using on demand pricing.
AWS RDS | AWS Aurora | ||
Notes | PostgreSQL pricing is unavailable at this time. So we are using MySQL pricing: | High availability for PostgreSQL is not yet available. Assuming 2x costs | |
Cost items | db.r3.large multi AZ (2 cpu, 15GB ram) $376 1TB General Purpose (SSD) Storage $230 | db.r3.large $0.290 $215 1TB storage $.100/GB $100 | CPU Master 2 vCPU $94 CPU Slave 2 vCPU $94 RAM Master 13GB $103 RAM Slave 13GB $103 SSD Storage Maser 1TB $187 SSD Storage Slave 1TB $187 |
Total | $606 | $315 | $758 |
Results
First times to restore and parse 45GB table containing JSON.
AWS RDS | AWS Aurora | Desktop | ||
Restore (seq write) | 50:28 | 41:44 | 21:15 | 24:00 |
Scan (seq read) | 32:30 | 5:02 | 15:58 | 2:57 |
Parse JSON (CPU) | 1:46:50 | 46:54 | 51:55 | 31:51 |
Dividing time by size we get processing speed in MB/sec:
AWS RDS | AWS Aurora | Google PG | Desktop | |
Restore (seq write) | 14.53 | 17.57 | 34.50 | 30.55 |
Scan (seq read) | 22.56 | 145.69 | 45.92 | 248.58 |
Parse JSON (CPU) | 6.86 | 15.64 | 14.12 | 23.02 |
Observations
JSON parsing is CPU bound and only runs in one thread so this is a tight bottleneck on all platforms. Still seems that Google gives faster virtual cpus than RDS.
Writing to Aurora is slow and reading is fast. Could be because of their 6 way replication.
Google claims to give many more IOPS than Amazon RDS (15000 vs 3000). Our test shows only about 2x speedup.
As of June 2017 Google is the best value, but no HA support yet. Aurora is not yet ready but might mature to be the best option. RDS is still a winner for production work.
SQL queries used
Sequential scan:
select count(*) row_count, sum(length(content)) text_length from js
JSON parse:
with jp as (select json_array_elements(content::json) x from js) select distinct x->>'ID' id, x->>'Name' name from jp