Preventing WAL growth on Postgres running on AWS RDS
In this document, we will discuss how to prevent WAL growth for a Postgres database running on AWS RDS.
Last updated: 04/17/2024
What is WAL?
WAL stands for Write-Ahead Logging, which is a method for Postgres to handle change data capture (CDC). Database record changes will be logged and stored within WAL to ensure data integrity.
The WAL also makes it accessible to downstream applications to subscribe to a replication slot and consume database CDC changes reliably.
WAL growth is a problem and can result in replication slot overflow. Replication slot overflow happens when the WAL accumulates and grows, consumes all your database’s storage and causes your database to go down.
Why is WAL growth an issue on AWS RDS?
As Gunnar Morling covered extensively in his blog post here, AWS RDS periodically writes a heartbeat to a table within rdsadmin
every 5 minutes. AWS RDS has a default setting of 64MB for each WAL segment size, so each heartbeat takes up 64MB of memory. The purpose for AWS writing heartbeats is for various reasons, including to monitor the health of their databases.
An important note is that the heartbeats are written to a table within an internal database that is not observed by most CDC applications. For an active database, heartbeats don’t cause an issue because the WAL is almost constantly being drained as new CDC logs are processed. However, if you have a test database that has low traffic or an idle database, you will see your WAL accumulate by 64MB every 5 minutes, or 18.4GB per day! If left unchecked, this can cause replication slot overflow.
Enable Heartbeats for idle/low traffic databases
You only need to enable this feature if your database is low traffic or idle for long periods of time, which are primarily test databases. This feature is not necessary for active databases because the WAL growth will reset as soon as there are data changes from the table(s) you are observing.
Use the Heartbeat feature to prevent WAL growth
Debezium has a Heartbeat feature that allows us to periodically ping the database to create a CDC event and prevent WAL growth for idle/low traffic databases.
To turn this on with Artie, first create a heartbeat_table within Postgres:
Check
Enable Heartbeats
under Deployment Advanced Settings
If you do have heartbeats enabled
Does the heartbeats table (
test_heartbeat_table
) actually exist?Is the heartbeats table included in your Postgres PUBLICATIONS?
Does the service account have permissions to write to the table?
Monitor any long-running transactions
Are there any long-running queries that may prevent your replication slot from being advanced? Check if there are any long running queries by running this:
Additional preventative measures
In addition to enabling heartbeats, it is best practice to set up the following:
Monitoring your Amazon RDS instance for
free_storage_space
.Implement
statement_timeout
so long running transactions do not block replication slots from advancing. [AWS guide]Enable storage autoscaling. The guide to enable this can be found here.
Configure max_slot_wal_keep_size to the desired size
The default value is -1
Each file size is 64mb
If you want to set this to be 1 GB, set
max_slot_wal_keep_size
to be 16
Advanced commands
Last updated