PostgreSQL Replication Slots

Back in the days when “Replication Slots” were not yet introduced, managing the WAL segments were a challenge. In standard streaming replication, the master has no knowledge of the slave status. Take the example of a master that executes a large transaction, while a standby node is in maintenance mode for a couple of hours (such as upgrading the system packages, adjusting network security, hardware upgrade, etc.). At some point, the master removes its transaction log (WAL segments) as checkpoint passes. Once the slave is off maintenance, it possibly has a huge slave lag and has to catch up with the master. Eventually, the slave will get a fatal issue like below:

LOG:  started streaming WAL from primary at 0/73000000 on timeline 1
FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000000000000073 has already been removed

The typical approach is to specify in your postgresql.conf a WAL archival script that will copy WAL files to one or more long-term archive locations. If you don’t have any standbys or other streaming replication clients, then basically the server can discard the WAL file once the archive script is done or responds OK. But you’ll still need some recent WAL files for crash recovery (data from recent WAL files get replayed during crash recovery. In our example of a standby node which is placed for a long maintenance period, problems arise when it comes back online and asks the primary for a WAL file that the primary no longer has, then the replication fails.

This problem was addressed in PostgreSQL 9.4 via “Replication Slots”.

If not using replication slots, a common way to reduce the risk of failing replication is to set the wal_keep_segments high enough so that WAL files that might be needed won’t be rotated or recycled. The disadvantage of this approach is that it’s hard to determine what value is best for your setup. You won’t need maintenance on a daily basis or you won’t need to retain a large pile of WAL files that eats your disk storage. While this works, it’s not an ideal solution as risking disk space on the master can cause incoming transactions to fail.

Alternative approaches of not using replication slots is to configure PostgreSQL with continuous archiving and provide a restore_command to give the replica access to the archive. To avoid WAL build-up on the primary, you may use a separate volume or storage device for the WAL files, e.g., SAN or NFS. Another thing is with synchronous replication since it requires that primary has to wait for standby nodes to commit transaction. This means, it assures that WAL files have been applied to the standby nodes. But still, it’s best that you provide archiving commands from the primary so that once WAL’s are recycled in the primary, rest assured that you have WAL backups in case for recovery. Although in some situations, synchronous replication is not an ideal solution as it comes with some performance overhead as compared with asynchronous replication.

Types of Replication Slots
There are two types of replication slots. These are:

Physical Replication Slots
Can be used for standard streaming replication. They will make sure that data is not recycled too early.

Logical Replication Slots
Logical replication does the same thing as physical replication slots and are used for logical replication. However, they are used for logical decoding. The idea behind logical decoding is to give users a chance to attach to the transaction log and decode it with a plugin. It allows to extract changes made to the database and therefore to the transaction log in any format and for any purpose.

In this blog, we’ll be using physical replication slots and how to achieve this using ClusterControl.

Advantages and Disadvantages of Using Replication Slots
Replications slots are definitely beneficial once enabled. By default, “Replication Slots” are not enabled and have to be set manually. Among the advantages of using Replication Slots are

Ensures master retains enough WAL segments for all replicas to receive them
Prevents the master from removing rows that could cause recovery conflict on the replicas
A master can only recycle the transaction log once it has been consumed by all replicas. The advantage here is that a slave can never fall behind so much that a re-sync is needed.
Replication slots also come with some caveats.

An orphan replication slot can cause unbounded disk growth due to piled up WAL files from the master
Slave nodes placed under long maintenance (such as days or weeks) and that are tied to a replication slot will have unbounded disk growth due to piled up WAL files from the master
You can monitor this by querying pg_replication_slots to determine the slots that are not used. We’ll check back on this a bit later.

Using Replication Slots
As stated earlier, there are two types of replication slots. For this blog, we’ll use physical replication slots for streaming replication.

Creating A Replication Slot
Creating a replication is simple. You need to invoke the existing function pg_create_physical_replication_slot to do this and has to be run and created in the master node. The function is simple,

maximus_db=# \df pg_create_physical_replication_slot
Schema              | pg_catalog
Name                | pg_create_physical_replication_slot
Result data type    | record
Argument data types | slot_name name, immediately_reserve boolean DEFAULT false, OUT slot_name name, OUT xlog_position pg_lsn
Type                | normal

e.g. Creating a replication slot named slot1,

postgres=# SELECT pg_create_physical_replication_slot('slot1');
-[ RECORD 1 ]-----------------------+---------
pg_create_physical_replication_slot | (slot1,)

The replication slot names and its underlying configuration is only system-wide and not cluster-wide. For example, if you have nodeA (current master), and standby nodes nodeB and nodeC, creating the slot on a master nodeA namely “slot1”, then data will not be available to nodeB and nodeC. Therefore, when failover/switchover is about to happen, you need to re-create the slots you have created.

Dropping A Replication Slot
Unused replication slots have to be dropped or deleted. As stated earlier, when there are orphaned replication slots or slots that have not been assigned to any client or standby nodes, it can lead to boundless disk space issues if left undropped. So it is very important that these have to be dropped when it’s no longer use. To drop it, simply invoke pg_drop_replication_slot. This function has the following definition:

maximus_db=# \df pg_drop_replication_slot
Schema              | pg_catalog
Name                | pg_drop_replication_slot
Result data type    | void
Argument data types | name
Type                | normal

Dropping it is simple:

maximus_db=# select pg_drop_replication_slot('slot2');
-[ RECORD 1 ]------------+-
pg_drop_replication_slot |

Monitoring Your PostgreSQL Replication Slots
Monitoring your replication slots is something that you don’t want to miss. Just collect the information from view pg_replication_slots in the primary/master node just like below:

postgres=# select * from pg_replication_slots;
-[ RECORD 1 ]-------+-----------
slot_name           | main_slot
plugin              |
slot_type           | physical
datoid              |
database            |
active              | t
active_pid          | 16297
xmin                |
catalog_xmin        |
restart_lsn         | 2/F4000108
confirmed_flush_lsn |
-[ RECORD 2 ]-------+-----------
slot_name           | main_slot2
plugin              |
slot_type           | physical
datoid              |
database            |
active              | f
active_pid          |
xmin                |
catalog_xmin        |
restart_lsn         |
confirmed_flush_lsn |

The above result shows that the main_slot has been taken, but not main_slot2.

Another thing you can do is to monitor how much lag behind the slots you have. To achieve this, you can simply use the query based on the sample result below:

postgres=# SELECT redo_lsn, slot_name,restart_lsn, 
round((redo_lsn-restart_lsn) / 1024 / 1024 / 1024, 2) AS GB_behind 
FROM pg_control_checkpoint(), pg_replication_slots;
redo_lsn    | slot_name | restart_lsn | gb_behind 
 1/8D400238 |     slot1 | 0/9A000000 | 3.80

But redo_lsn is not present in 9.6, shall use redo_location, so in 9.6,

imbd=# SELECT redo_location, slot_name,restart_lsn, 
round((redo_location-restart_lsn) / 1024 / 1024 / 1024, 2) AS GB_behind 
FROM pg_control_checkpoint(), pg_replication_slots;
-[ RECORD 1 ]-+-----------
redo_location | 2/F6008BE0
slot_name     | main_slot
restart_lsn   | 2/F6008CC0
gb_behind     | 0.00
-[ RECORD 2 ]-+-----------
redo_location | 2/F6008BE0
slot_name     | main_slot2
restart_lsn   | 2/F6008CC0
gb_behind     | 0.00

System Variable Requirements
Implementing replication slots requires manual setting. There are variables that you have to keep in mind that require changes and be specified in your postgresql.conf. See below:

max_replication_slots – If set to 0, this means that replication slots are totally disabled. If you’re using PostgreSQL < 10 versions, this slot has to be specified other than 0 (default). Since PostgreSQL 10, the default is 10. This variable specifies the maximum number of replication slots. Setting it to a lower value than the number of currently existing replication slots will prevent the server from starting.
wal_level – must at least be replica or higher (replica is default). Setting hot_standby or archive will map to replica. For a physical replication slot, replica is enough. For logical replication slots, logical is preferred.
max_wal_senders – set to 10 by default, 0 in 9.6 version which means replication is disabled. We suggest you set this at least to 16 especially when running with ClusterControl.
hot_standby – in versions < 10, you need to set this to on which is off by default. This is important for standby nodes which means when on, you can connect and run queries during recovery or in standby mode.
primary_slot_name – this variable is set via recovery.conf on the standby node. This is the slot to be used by the receiver or standby node when connecting with the sender (or primary/master).
You have to take note that these variables mostly require a database service restart in order to reload new values.

Using Replication Slots in a ClusterControl PostgreSQL Environment
Now, let’s see how we can use physical replication slots and implement them within a Postgres setup managed by ClusterControl.

Deploying of PostgreSQL Database Nodes
Let’s start deploying a 3-node PostgreSQL Cluster using ClusterControl using PostgreSQL 9.6 version this time.

ClusterControl will deploy nodes with the following system variables defined accordingly based on their defaults or tuned up values. In:

postgres=# select name, setting from pg_settings where name in ('max_replication_slots', 'wal_level', 'max_wal_senders', 'hot_standby');
         name          | setting 
 hot_standby           | on
 max_replication_slots | 0
 max_wal_senders       | 16
 wal_level             | replica
(4 rows)

In versions PostgreSQL > 9.6, max_replication_slots default value is 10 which is enabled by default but not in 9.6 or lower versions which is disabled by default. You need to assign max_replication_slots higher than 0. In this example, I set max_replication_slots to 5.

root@debnode10:~# grep 'max_replication_slots' /etc/postgresql/9.6/main/postgresql.conf 
# max_replication_slots = 0                     # max number of replication slots
max_replication_slots = 5

and restarted the service,

root@debnode10:~# pg_lsclusters 
Ver Cluster Port Status Owner    Data directory Log file
9.6 main    5432 online postgres /var/lib/postgresql/9.6/main pg_log/postgresql-%Y-%m-%d_%H%M%S.log
root@debnode10:~# pg_ctlcluster 9.6 main restart

Setting The Replication Slots For Primary and Standby Nodes
There’s no option in ClusterControl to do this, so you have to create your slots manually. In this example, I created the slots in the primary in host pgdbadmin@maximus_db=# SELECT pg_create_physical_replication_slot('slot1'), pg_create_physical_replication_slot('slot2');
 pg_create_physical_replication_slot | pg_create_physical_replication_slot 
 (slot1,)                            | (slot2,)
(1 row)

Checking what we have just created shows, pgdbadmin@maximus_db=# select * from pg_replication_slots;
 slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 
 slot1     | | physical  | | | f      | | |       | | 
 slot2     | | physical  | | | f      | | |       | | 
(2 rows)

Now in the standby nodes, we need to update the recovery.conf and add the variable primary_slot_name and change the application_name so it’s easier to identify the node. Here’s how it looks like in host recovery.conf:

root@debnode11:/var/lib/postgresql/9.6/main/pg_log# cat ../recovery.conf 
standby_mode = 'on'
primary_conninfo = 'application_name=node11 host= port=5432 user=cmon_replication password=m8rLmZxyn23Lc2Rk'
recovery_target_timeline = 'latest'
primary_slot_name = 'slot1'
trigger_file = '/tmp/failover_5432.trigger'

Doing the same thing as well in host but changed the application_name and set the primary_slot_name = ‘slot2’.

Checking the replication slot health: pgdbadmin@maximus_db=# select * from pg_replication_slots;
 slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 
 slot1     | | physical  | | | t      | 24252 | |       | 0/CF0A4218 | 
 slot2     | | physical  | | | t      | 11635 | |       | 0/CF0A4218 | 
(2 rows)

What Else Do You Need?
Since ClusterControl doesn’t support Replication Slots as of this time, there are things that you need to take into account. What are these? Let’s go into details.

Failover/Switchover Process
When an auto failover or switchover via ClusterControl has been attempted, slots will not be retained from the primary and on the standby nodes. You need to re-create this manually, check the variables if set correctly, and modify the recovery.conf accordingly.

Rebuilding a Slave from a Master
When rebuilding a slave, the recovery.conf will not be retained. This means that your recovery.conf settings having the primary_slot_name will be erased. You need to specify this manually again and check the pg_replication_slots view to determine if slots are properly used or left orphaned.

If you want to rebuild the slave/standby node from a master, you might have to consider specifying the PGAPPNAME env variable just like the command below:

$ export PGAPPNAME="app_repl_testnode15"; /usr/pgsql-9.6/bin/pg_basebackup -h -U cmon_replication -D /var/lib/pgsql/9.6/data -p5434 -W -S main_slot -X s -R -P

Specifying the -R param is very important so it will re-create the recovery.conf, while -S shall specify what slot name to use when rebuilding the standby node.

Implementing the Replication Slots in PostgreSQL is straightforward yet there are certain caveats that you must remember. When deploying with ClusterControl, you’ll need to update some settings during failover or slave rebuilds.