Intro
AWS RDS is very convenient and takes care of almost all DBA tasks. It just works as long as you stay inside AWS. But if you want to have a local copy of your database or need to move data to another host it can be tricky.
TL;DR – For our solution skip to part last section
What doesn’t work
AWS daily backups
AWS RDS by default creates daily backups of your data. First thought would be get such backup and restore it locally. But those are not regular Postgres backups. They probably are VM image copies, but no way to know, as you cannot copy or see them. The only option is to restore them to another RDS instance.
Postgresql replication
AWS uses replication to maintain hot standby, but do not give access to that.
Cold backups
No access to RDS filesystem, so can’t do that either
Amazon Database Migration Service (DMS)
AWS DMS sounds too good to be true. Thea features that would be killer if they worked:
- Any to any DB connections (RDS PG to local MySQL for example is possible)
- Change data capture (CDC) and continuous replication without modifications to database schema
- DDL capture
In reality we couldn’t get it to do even basic data copying. It corrupts/truncates data, throws errors and there’s very little talk about it in Stackoverflow or AWS forums. Not ready for prime time but worth keeping an eye on it.
Foreign data wrappers (FDW) – pushing from RDS to remote
It works and can be used to push incremental changes like so:
ON RDS: insert into remote_bigtable select * from bigtable where id>(select max(id) from remote_bigtable)
But speed is only few rows per second. Can’t use it for anything
Foreign data wrappers (FDW) – pulling data
It’s faster than pushing but more difficult. If you run query on local db like this:
ON Local:
select * from rds_bigtable where id>(select max(id) from bigtable)
It will fetch entire bigtable and do filtering locally (9.6 fixes this).
So for full data pulls this method is still too slow and for incremental requires extra work like so:
ON local: select max(id) from bigtable ON RDS: create tmp_bigtable as select * from bigtable where id>maxid_on_local ON local: insert into local_bigtable select * from rds_tmp_bigtable ON RDS: drop table tmp_bigtable
We use this approach for incremental data pulls. It’s pain. Hopping better alternatives will come up.
pg_dump running on local server
It works, but much slower than what’s possible
pg_dump running on tempororay EC2 instance
Finally a solution that works and is fast!
Steps to create it
- Create EC2 instance with enough space to store full db dump in the same datacenter/zone as RDS
- Install postgres tools on it: sudo yum install postgresql95
- Install aws command line tools (optional, need for file copying)
- Open firewall in RDS to allow connections from temporary instance
- Run pg_dump –host=xxx.rds.amazonaws.com –username=xxx –file=x.dmp –format=c
- Copy file to s3 (aws s3 cp)
- Delete instance
- On local host get and restore file: pg_restore –host=localhost –username=x x.dmp
It your db is large enough pg_dump dies in the middle of export. It’s an obscure bug with ssl. AWS will force ssl socket renegotiation after preset amount has been transfered (seems to be about 6GB). PostgreSQL doesn’t renegotiate and kills connection instead.
Solution is to temporarily disable SSL on RDS (all security implications and requires a db restart). In RDS Parameter group set flag ssl=0
Conclusion
Copying Postgresql RDS data to local server is harder than it should to be. Hopefully my long adventures, misadventures and discoveries will save you some time.