PostgreSQL Automatic Failover

High-Availibility for Postgres, based on Pacemaker and Corosync.

Administration

This manual gives an overview of the tasks you can expect to do when using PAF to manage PostgreSQL instances for high availability, as well as several useful commands.

Table of contents:

A word of caution

Pacemaker is a complex and sensitive tool.

Before running any command modifying an active cluster configuration, you should always validate its effect beforehand by using the crm_shadow and crm_simulate tools.

Pacemaker command line tools

The Pacemaker-related actions documented on this page use exclusively generic Pacemaker commands.

Depending on the Pacemaker packaging policy and choices of your operating system, you may have an additional command line administration tool installed (usually, pcs or crmsh).

If that’s the case, you should obviously use the tool that you’re the most comfortable with.

Pacemaker maintenance mode

Pacemaker provides commands to put several resources or even the whole cluster in maintenance mode, meaning that the “unmanaged” resources will not be monitored anymore, and changes to their status will not trigger any automatic action.

If you’re about to do something that may impact Pacemaker (reboot a PostgreSQL instance, a whole server, change the network configuration, etc.), you should consider using it.

Here is the generic command line to put the cluster in maintenance mode:

crm_attribute --name maintenance-mode --update true

And how to leave the maintenance mode:

crm_attribute --name maintenance-mode --delete

Refer to the official Pacemaker’s documentation related to your installation for the specific commands.

PostgreSQL Administrating

If your PostgreSQL instance is managed by Pacemaker, you should proceed to administration tasks with care.

Especially, if you need to restart a PostgreSQL instance, you should first put the resource in maintenance mode, so Pacemaker will not attempt to automatically restart it.

Also, you should refrain to use any tool other than pg_ctl (provided with any PostgreSQL installation) to start and stop your instance if you need to.

“Other tools” may include any conveniance wrapper, like SysV init scripts, systemd unit files, or pg_ctlcluster Debian wrapper.

Pacemaker only uses pg_ctl, and as other tools behave differently, using them could lead to some unpredictable behavior, like an init script reporting that the instance is stopped when it is not.

And again, we can not emphasis this stronger enough: if you really need to use pg_ctl, do it under maintenance mode.

Manual switchover

Depending on your configuration, and most notably on the constraints you set up on the nodes for your resources, Pacemaker may trigger automatic switchover of the resources.

If required, you can also ask it to do a manual switchover, for example before doing a maintenance operation on the node hosting the primary resource.

These steps use only Pacemaker commands to move the Master role of the resource around.

Note that in these examples, we only ask for Pacemaker to move the Master role. That means that, based on your configuration, the following should happen:

Moreover, during the switchover process, PAF makes sure the old primary is be able to catchup with the new one. That means that if you try to switchover to a node which is not in streaming replication with the primary, it fails.

Move the primary role resource to another node

crm_resource --move --master --resource <PAF_resource_name> --host <target_node>

This command set an INFINITY score on the target node for the primary resource. This forces Pacemaker to trigger the switchover to the target node:

Ban the primary role from a node

crm_resource --ban --master --resource <PAF_resource_name>

This command will set up a -INFINITY score on the node currently running the primary resource. This will force Pacemaker to trigger the switchover to another available node:

Clear the constraints after the switchover

Unless you used the --lifetime option of crm_resource, the scores set up by the previous commands will not be automatically removed. This means that unless you remove these scores manually, your primary resource is now stuck on one node (--move case), or forbidden on one node (--ban case).

To allow your cluster to be fully operational again, you have to clear these scores. The following command will remove any constraint set by the previous commands:

crm_resource --clear --master --resource <PAF_resource_name>

Note that depending on your configuration, the --clear action may trigger another switchover (for example, if you set up a preferred node for the primary resource). Before running such a command (or really, any command modifying your cluster configuration), you should always validate its effect beforehand by using the crm_shadow and crm_simulate tools.

Failover

That’s it, there was a problem with the node hosting the primary PostgreSQL instance, and your cluster triggered a failover.

That means one of the standy instances has been promoted, is now a primary PostgreSQL instance, running as the Master resource, and the high availability IP address has been moved to this node. That’s exactly for this situation that you installed Pacemaker and PAF, so far so good.

Now, what needs to be done ?

Verify everything, fix every problem found

Hopefully, you did configure a reliable fencing device, so the failing node has been completely disconnected from the cluster. From this point, first you need to investigate on the origin of failure, and fix whatever may the problem be. At this point, you usually look for network, virtualization or hardware issues.

Once that’s done, you connect to your fenced node, and before you do anything (including un-fence it if your fencing method involves network isolation only), ensure that Corosync, Pacemaker and PostgreSQL processes are down: you certainly don’t want these to suddently kick in your alive cluster!

Then, again, you check everything for errors related to the failure. Good starting points are the OS, Pacemaker and PostgreSQL log files. If you find something that went wrong, fix it before moving to the next step.

Rebuild the failed PostgreSQL instance

Finally, you need to rebuild the PostgreSQL instance on the failed node. That’s right, as the PostgreSQL resource suffered a failover, it is very likely that the promoted PostgreSQL instance was late by a few transactions.

So you need to rebuild your old, failed primary instance, based on the one currently used as the primary resource.

To do this, use any backup and recovery method that fits your configuration. PostgreSQL’s pg_basebackup tool may be handy if your instance is not too big, and if you’re in PostgreSQL 9.5+, you may want to consider pg_rewind. If you’re not familiar with all this rebuild thing, you should refer to the PostgreSQL’s documentation, before you even consider using the PAF agent. Obviously, waiting for a failover to happen before considering what needs to be done in that case is not a good idea.

Beware when you do your rebuild not to erase local files with a content specific to that node (at the very least, avoid erasing recovery.conf.pcmk and pg_hba.conf files content).

Once you have rebuilt your instance verify that you can successfully start it as a standby. Rremember to create the recovery.conf or standby.signal file (depending on the PostgreSQL version) in the instance’s PGDATA directory before starting it.

Reintroduce the node in the cluster

Then, it’s time to reintroduce your failed node in the cluster.

But before you actually do that, use the nice crm_simulate command with the --node-up option to do a dry run from an active node of the cluster.

If the cluster seems to keep its sanity based on the crm_simulate output, then you can bring Corosync and Pacemaker processes up on the previously failed node, and you’re finally done!

Note that you may have to clear previous errors (failcounts) before Pacemaker considers your rebuilt PostgreSQL instance as a sane resource.

That’s it!

In conclusion, remember that PostgreSQL Automatic Failover resource agent does not rebuild a failed instance for you, nor does it do anything that may alter your data or your configuration.

So you need to be prepared to deal with the failover case, by documenting your configuration and the actions required to bring a failed node up.

Full failover example

Here is a full example of a failover.

Consider the following situation:

The node srv1 becomes unresponsive - let’s say that someone messed up with the firewall rules, so the node is still up, but not visible anymore to the cluster.

Based on the quorum situation, Pacemaker triggers the following actions:

From this point, your cluster is in this situation:

Only two nodes are now alive in the quorum, so the lost of any new member would bring the whole cluster down. You don’t want things to stay that way too long, so you’ll have to bring srv1 up again:

Now, srv1 is clean, and you can consider integrating it back in the cluster. Go to another node, like srv2, and check the cluster reaction if srv1 member was to be up again :

crm_simulate -SL --node-up srv1

This should print something like this:

That seems good! So now you just need to really start Corosync and Pacemaker on srv1, and if everything goes as planned, you’re done.

Collecting traces

The crm_report utility will create an archive containing everything needed when reporting cluster problem.

The following command will collect all relevant configuration and logs between 7am and 9am on the 8th of November from all the nodes into an archive called /tmp/crm_report_crash_20161108.tar.bz2:

crm_report -f "2016-11-08 07:00:00" -t "2016-11-08 09:00:00" /tmp/crm_report_crash_20161108

The command works better when used on an active node (Pacemaker will guess the list of nodes from it’s configuration). Alternatively, you can use the -n “node1 node2” or -n node1 -n node2 to scpecify a list of nodes. It is requiered that all nodes are reachable thru ssh.

Be careful when sending these reports online as they may contain sensitive information like passwords.