This manual gives an overview of the tasks you can expect to do when using PAF to manage PostgreSQL instances for high availability, as well as several useful commands.
Table of contents:
Pacemaker is a complex and sensitive tool.
Before running any command modifying an active cluster configuration, you
should always validate its effect beforehand by using the
The Pacemaker-related actions documented on this page use exclusively generic Pacemaker commands.
Depending on the Pacemaker packaging policy and choices of your operating
system, you may have an additional command line administration tool installed
If that’s the case, you should obviously use the tool that you’re the most comfortable with.
Pacemaker provides commands to put several resources or even the whole cluster in maintenance mode, meaning that the “unmanaged” resources will not be monitored anymore, and changes to their status will not trigger any automatic action.
If you’re about to do something that may impact Pacemaker (reboot a PostgreSQL instance, a whole server, change the network configuration, etc.), you should consider using it.
Here is the generic command line to put the cluster in maintenance mode:
crm_attribute --name maintenance-mode --update true
And how to leave the maintenance mode:
crm_attribute --name maintenance-mode --delete
Refer to the official Pacemaker’s documentation related to your installation for the specific commands.
If your PostgreSQL instance is managed by Pacemaker, you should proceed to administration tasks with care.
Especially, if you need to restart a PostgreSQL instance, you should first put the resource in maintenance mode, so Pacemaker will not attempt to automatically restart it.
Also, you should refrain to use any tool other than
pg_ctl (provided with any
PostgreSQL installation) to start and stop your instance if you need to.
“Other tools” may include any conveniance wrapper, like SysV init scripts,
systemd unit files, or
pg_ctlcluster Debian wrapper.
Pacemaker only uses
pg_ctl, and as other tools behave differently, using them
could lead to some unpredictable behavior, like an init script reporting that
the instance is stopped when it is not.
And again, we can not emphasis this stronger enough: if you really need to
pg_ctl, do it under maintenance mode.
Depending on your configuration, and most notably on the constraints you set up on the nodes for your resources, Pacemaker may trigger automatic switchover of the resources.
If required, you can also ask it to do a manual switchover, for example before doing a maintenance operation on the node hosting the primary resource.
These steps use only Pacemaker commands to move the
Master role of the
Note that in these examples, we only ask for Pacemaker to move the
role. That means that, based on your configuration, the following should
Masterrole (like a Pacemaker controlled IP address) is also be affected
Moreover, during the switchover process, PAF makes sure the old primary is be able to catchup with the new one. That means that if you try to switchover to a node which is not in streaming replication with the primary, it fails.
crm_resource --move --master --resource <PAF_resource_name> --host <target_node>
This command set an
INFINITY score on the target node for the primary
resource. This forces Pacemaker to trigger the switchover to the target
demotePostgreSQL resource on the current primary node (stop the resource, and then start it as a standby resource)
promotePostgreSQL resource on the target node
crm_resource --ban --master --resource <PAF_resource_name>
This command will set up a
-INFINITY score on the node currently running the
primary resource. This will force Pacemaker to trigger the switchover to another
demotePostgreSQL resource on the current primary node (stop the resource, and then start it as a standby)
promotePostgreSQL resource on another node
Unless you used the
--lifetime option of
crm_resource, the scores set up by
the previous commands will not be automatically removed.
This means that unless you remove these scores manually, your primary resource
is now stuck on one node (
--move case), or forbidden on one node (
To allow your cluster to be fully operational again, you have to clear these scores. The following command will remove any constraint set by the previous commands:
crm_resource --clear --master --resource <PAF_resource_name>
Note that depending on your configuration, the
--clear action may trigger
another switchover (for example, if you set up a preferred node for the primary
Before running such a command (or really, any command modifying your cluster
configuration), you should always validate its effect beforehand by using the
That’s it, there was a problem with the node hosting the primary PostgreSQL instance, and your cluster triggered a failover.
That means one of the standy instances has been promoted, is now a primary
PostgreSQL instance, running as the
Master resource, and the high
availability IP address has been moved to this node.
That’s exactly for this situation that you installed Pacemaker and PAF, so far
Now, what needs to be done ?
Hopefully, you did configure a reliable fencing device, so the failing node has been completely disconnected from the cluster. From this point, first you need to investigate on the origin of failure, and fix whatever may the problem be. At this point, you usually look for network, virtualization or hardware issues.
Once that’s done, you connect to your fenced node, and before you do anything (including un-fence it if your fencing method involves network isolation only), ensure that Corosync, Pacemaker and PostgreSQL processes are down: you certainly don’t want these to suddently kick in your alive cluster!
Then, again, you check everything for errors related to the failure. Good starting points are the OS, Pacemaker and PostgreSQL log files. If you find something that went wrong, fix it before moving to the next step.
Finally, you need to rebuild the PostgreSQL instance on the failed node. That’s right, as the PostgreSQL resource suffered a failover, it is very likely that the promoted PostgreSQL instance was late by a few transactions.
So you need to rebuild your old, failed primary instance, based on the one currently used as the primary resource.
To do this, use any backup and recovery method that fits your configuration.
pg_basebackup tool may be handy if your instance is not too
big, and if you’re in PostgreSQL 9.5+, you may want to consider
If you’re not familiar with all this rebuild thing, you should refer to the
PostgreSQL’s documentation, before you even consider using the PAF agent.
Obviously, waiting for a failover to happen before considering what needs to
be done in that case is not a good idea.
Beware when you do your rebuild not to erase local files with a content
specific to that node (at the very least, avoid erasing
pg_hba.conf files content).
Once you have rebuilt your instance verify that you can successfully start it
as a standby. Rremember to create the
(depending on the PostgreSQL version) in the instance’s
before starting it.
Then, it’s time to reintroduce your failed node in the cluster.
But before you actually do that, use the nice
crm_simulate command with
--node-up option to do a dry run from an active node of the cluster.
If the cluster seems to keep its sanity based on the
then you can bring Corosync and Pacemaker processes up on the previously failed
node, and you’re finally done!
Note that you may have to clear previous errors (
failcounts) before Pacemaker
considers your rebuilt PostgreSQL instance as a sane resource.
In conclusion, remember that PostgreSQL Automatic Failover resource agent does not rebuild a failed instance for you, nor does it do anything that may alter your data or your configuration.
So you need to be prepared to deal with the failover case, by documenting your configuration and the actions required to bring a failed node up.
Here is a full example of a failover.
Consider the following situation:
Masterresource (primary PostgreSQL instance and Pacemaker’s managed IP)
Slaveresources (standby PostgreSQL instances, connected to the primary using streaming replication)
srv1 becomes unresponsive - let’s say that someone messed up with the
firewall rules, so the node is still up, but not visible anymore to the
Based on the quorum situation, Pacemaker triggers the following actions:
srv1node (as you can imagine, in this situation your STONITH device should not try to connect to the node it has to fence, that’s part of fencing’s configuration good practices)
srv1has been fenced (say, physically powered off), promote the standby that is the most advanced in transaction replay (
srv2for the example).
From this point, your cluster is in this situation:
srv1is powered off, and marked as
offlinein the cluster
Masterresource (primary PostgreSQL instance and Pacemaker’s managed IP)
Slaveresource (standby PostgreSQL instance, connected to the primary using streaming replication)
Only two nodes are now alive in the quorum, so the lost of any new member would
bring the whole cluster down.
You don’t want things to stay that way too long, so you’ll have to bring
srv1server and correct the firewall problem
srv1, for example using the
pg_basebackupPostgreSQL tool, ensuring you don’t erase the
srv1 is clean, and you can consider integrating it back in the cluster.
Go to another node, like
srv2, and check the cluster reaction if
member was to be up again :
crm_simulate -SL --node-up srv1
This should print something like this:
first, the actual cluster state:
Current cluster status: Online: [ srv2 srv3 ] OFFLINE: [ srv1 ] fence_vm_srv1 (stonith:fence_virsh): Started srv2 fence_vm_srv2 (stonith:fence_virsh): Started srv3 fence_vm_srv3 (stonith:fence_virsh): Started srv2 Master/Slave Set: pgsql-ha [pgsqld] Masters: [ srv2 ] Slaves: [ srv3 ] Stopped: [ srv1 ] pgsql-pri-ip (ocf::heartbeat:IPaddr2): Started srv2
Performing requested modifications + Bringing node srv1 online Transition Summary: * Start pgsqld:2 (srv1)
Revised cluster status: Online: [ srv1 srv2 srv3 ] fence_vm_srv1 (stonith:fence_virsh): Started srv2 fence_vm_srv2 (stonith:fence_virsh): Started srv3 fence_vm_srv3 (stonith:fence_virsh): Started srv2 Master/Slave Set: pgsql-ha [pgsqld] Masters: [ srv2 ] Slaves: [ srv1 srv3 ] pgsql-pri-ip (ocf::heartbeat:IPaddr2): Started srv2
That seems good!
So now you just need to really start Corosync and Pacemaker on
srv1, and if
everything goes as planned, you’re done.
crm_report utility will create an archive containing everything needed when reporting cluster problem.
The following command will collect all relevant configuration and logs between 7am and 9am on the 8th of November from all the nodes into an archive called /tmp/crm_report_crash_20161108.tar.bz2:
crm_report -f "2016-11-08 07:00:00" -t "2016-11-08 09:00:00" /tmp/crm_report_crash_20161108
The command works better when used on an active node (Pacemaker will guess the list of nodes from it’s configuration). Alternatively, you can use the -n “node1 node2” or -n node1 -n node2 to scpecify a list of nodes. It is requiered that all nodes are reachable thru ssh.
Be careful when sending these reports online as they may contain sensitive information like passwords.