PostgreSQL Automatic Failover

High-Availibility for Postgres, based on Pacemaker and Corosync.

Cluster administration under CentOS 7

In this document, we are working with cluster under CentOS 7.2 using mostly the pcs command. It supposes that the pcsd deamon is enabled and running and authentication between node is set up (see quick start).

Make sure to experiment/train yourself on a testing plateform. Write your own doc related to your own environment. Check your doc and exercice it on a regular basis.


Starting or stopping the cluster

Here is the command to start the cluster on all existing nodes:

# pcs cluster start --all
srv2: Starting Cluster...
srv1: Starting Cluster...
srv3: Starting Cluster...

Here is the command to stop the cluster on the local node it is executed:

# pcs cluster stop

You can also add a designated node if needed, eg.:

# pcs cluster stop srv2

It stops or move away all the resources on the node, then stops Pacemaker and Corosync.

Note that the cluster forbid you to stop too many nodes so it can keep the quorum:

# pcs cluster stop
Error: Stopping the node will cause a loss of the quorum, use --force to override

We just replace stop with start for the opposite command, to start the cluster on one node. These two commands are equivalent:

srv2# pcs cluster start # executed on srv2
srv1# pcs cluster start srv2 # executed from srv1

If you want to stop the cluster on all nodes, just add --all to your command:

# pcs cluster stop --all
srv3: Stopping Cluster (pacemaker)...
srv2: Stopping Cluster (pacemaker)...
srv1: Stopping Cluster (pacemaker)...
srv3: Stopping Cluster (corosync)...
srv2: Stopping Cluster (corosync)...
srv1: Stopping Cluster (corosync)...

This last command is perfectly safe and your cluster will start cleanly when desired.

To avoid moving your resources around during cluster shutdown (eg. if one node is shutting down slowly), ask the cluster to stop the resource first. Eg.:

pcs resource disable pgsql-ha --wait
pcs cluster stop --all

On custer startup, you will have to do the opposite action:

pcs cluster start --all --wait
pcs resource enable pgsql-ha --wait

Swapping primary and standby roles between nodes

In this chapter, we describe how to move the primary role from one node to the other and getting back to the cluster the former primary as a standby.

Here is the command to move the primary role from srv1 to srv2:

# pcs resource move --master pgsql-ha srv2
# pcs resource clear pgsql-ha

That’s it. Note that the former primary became a standby and start replicating with the new primary.

You could add --wait so the command exits when everything is done. Here is an example moving back the primary to srv1:

# pcs resource move --wait --master pgsql-ha srv1
Resource 'pgsql-ha' is master on node srv1; slave on node srv2.
# pcs resource clear pgsql-ha

An INFINITY constraint location is set to move the primary role to the given node. You must clear this constraint to avoid unexpected location behavior using the pcs resource clear command.

Giving the destination node is not mandatory. If no destination node is given, a -INFINITY score is set on the primary current node to force it to move away:

# pcs resource move --wait --master pgsql-ha
Warning: Creating location constraint cli-ban-pgsql-ha-on-srv1 with a score of -INFINITY for resource pgsql-ha on node srv1.
This will prevent pgsql-ha from being promoted on srv1 until the constraint is removed. This will be the case even if srv1 is the last node in the cluster.
Resource 'pgsql-ha' is master on node srv2; slave on node srv1.

# pcs constraint show | grep Master
    Disabled on: srv1 (score:-INFINITY) (role: Master)

# pcs resource clear pgsql-ha

# pcs constraint show | grep Master

PAF update

Updating the PostgreSQL Auto-Failover resource agent does not requires to stop your PostgreSQL cluster. You just need to make sure the cluster manager do not decide to run an action while the system updates the pgsqlms script or the libraries. It’s quite improbable, but this situation is still possible.

Easiest and faster way to update PAF

The easiest way to acheive a clean update is to put the whole cluster in maintenance mode and update PAF, eg.:

# pcs property set maintenance-mode=true
# yum install -y
# pcs property set maintenance-mode=false

That’s it, you are done.

Keep cluster’s hands off PostgreSQL resources while updating PAF

If putting the whole cluster is not an option to you, you must ask the cluster to only ignore and avoid your PostgreSQL resources. The cluster will still be in charge of other resources.

Considers the PostgreSQL multistate resource is called pgsql-ha.

The following command achieve two goals. The first one forbids the cluster resource manager to react on unexpected status by putting the resource in unmanaged mode (unmanage pgsql-ha). The second one stops the monitor actions for this resources (--monitor).

# pcs resource unmanage pgsql-ha --monitor

Notice (unmanaged) appeared in crm_mon. In the following command, the meta attribute is-managed=false appeared for the pgsql-ha resource and enabled=false appeared for the monitor actions:

# pcs resource show pgsql-ha
 Master: pgsql-ha
  Meta Attrs: notify=true
  Resource: pgsqld (class=ocf provider=heartbeat type=pgsqlms)
   Attributes: bindir=/usr/pgsql-10/bin pgdata=/var/lib/pgsql/10/data
   Meta Attrs: is-managed=false
   Operations: demote interval=0s timeout=120s (pgsqld-demote-interval-0s)
               methods interval=0s timeout=5 (pgsqld-methods-interval-0s)
               monitor enabled=false interval=15s role=Master timeout=10s (pgsqld-monitor-interval-15s)
               monitor enabled=false interval=16s role=Slave timeout=10s (pgsqld-monitor-interval-16s)
               notify interval=0s timeout=60s (pgsqld-notify-interval-0s)
               promote interval=0s timeout=30s (pgsqld-promote-interval-0s)
               reload interval=0s timeout=20 (pgsqld-reload-interval-0s)
               start interval=0s timeout=60s (pgsqld-start-interval-0s)
               stop interval=0s timeout=60s (pgsqld-stop-interval-0s)

Now, update PAF, eg.:

# yum install -y

We can now put the resource in managed mode again and enable the monitor actions:

# pcs resource manage pgsql-ha --monitor

NOTE: you might want to enable monitor action first to check everything is going fine before getting back the control to the cluster. You can enable the monitor actions using the following commands (you must to set all parameters related to the action):

# pcs resource update pgsqld op monitor role=Master timeout=10s interval=15s enabled=true
# pcs resource update pgsqld op monitor role=Slave timeout=10s interval=16s enabled=true

Monitor action should be executed immediately and report no errors. Check that everything is running correctly in crm_mon and your log files before enabling the resource itself (without the --monitor):

# pcs resource manage pgsql-ha

PostgreSQL minor upgrade

This chapter explains how to do a minor upgrade of PostgreSQL on a two node cluster. Nodes are called srv1 and srv2, the PostgreSQL HA resource is called pgsql-ha. Node srv1 is hosting the primary.

The process is quite simple: upgrade the standby first, move the primary role and finally upgrade PostgreSQL on the former PostgreSQL primary node.

Here is how to upgrade PostgeSQL on the standby side:

# yum install --downloadonly postgresql93 postgresql93-contrib postgresql93-server
# pcs resource ban --wait pgsql-ha srv2
# yum install -y postgresql93 postgresql93-contrib postgresql93-server
# pcs resource clear pgsql-ha

Here are the details of these commands:

Now, we can move the primary resource to srv2, then take care of srv1:

# pcs resource move --wait --master pgsql-ha srv2
# yum install --downloadonly postgresql93 postgresql93-contrib postgresql93-server
# pcs resource ban --wait pgsql-ha srv1
# yum install -y postgresql93 postgresql93-contrib postgresql93-server
# pcs resource clear pgsql-ha

Minor upgrade is finished. Feel free to move your primary back to srv1 if you really need it.

Adding a node

In this chapter, we add server srv3 hosting a PostgreSQL standby instance as a new node in an existing two node cluster.

Setup everything so PostgreSQL can start on srv3 as a standby and enter in streaming replication. Remember to create the recovery configuration template file, setup the pg_hba.conf file etc.

On this new node, setup the pcsd deamon and its authentication:

# passwd hacluster
# systemctl enable pcsd
# systemctl start pcsd
# pcs cluster auth srv1 srv2 srv3 -u hacluster

On all other nodes, authenticate to the new node:

# pcs cluster auth srv3 -u hacluster

We are now ready to add the new node.

NOTE: Put the cluster in maintenance mode or use crm_simulate if you are afraid that some of your resources move all over the place when the new node appears

# pcs cluster node add srv3

NOTE: If corosync is set up to use multiple network for redundancy, use the following command:

# pcs cluster node add srv3,srv3-alt

Reload the corosync configuration on all the nodes if needed (it shouldn’t, but it doesn’t hurt anyway):

# pcs cluster reload corosync

Fencing is mandatory. See: Either edit the existing fencing resources to handle the new node if applicable, or add a new one being able to do it. In the example, we are using the fence_virsh fencing agent to create a dedicated fencing resource able to only fence srv3:

# pcs stonith create fence_vm_srv3 fence_virsh pcmk_host_check="static-list" \
    pcmk_host_list="srv3" ipaddr=""                             \
    login="<username>" port="srv3-c7" identity_file="/root/.ssh/id_rsa"      \
# pcs constraint location fence_vm_srv3 avoids srv3=INFINITY

We can now start the cluster on srv3:

# pcs cluster start

After some time checking cluster using crm_mon or pcs status, you should find of srv3 appearing in the cluster.

If your PosgreSQL standby is not started on the new node, maybe the cluster has been setup with a hard clone-max value. Check with:

# pcs resource show pgsql-ha

If you get a value either:

Your standby instance should start shortly.

Removing a node

This chapter explains how to remove a node called srv3 from a three node cluster.

The first command will put the node in standby. It stops all resources on the node:

# pcs cluster standby srv3

Next command simply remove the node from the cluster. It stops Pacemaker on srv3̀, remove the cluster setup from it and reconfigure other nodes:

# pcs cluster node remove srv3
srv3: Stopping Cluster (pacemaker)...
srv3: Successfully destroyed cluster
srv1: Corosync updated
srv2: Corosync updated

If you choose to set a specific clone-max attribute to the pgsql-ha resource, update it. You don’t need to update it if it is not set (see previous chapter).

# pcs resource meta pgsql-ha clone-max=2

Setting up a watchdog

First, read the watchdog chapter of the “How to fence your node” documentation page for some theory.

We now explain how to setup a watchdog device as a fencing method in Pacemaker. The i6300esb watchdog “hardware” has been added to the virtual machines in our demo cluster. This hardware is correctly discovered on boot by the kernel:

Dec  7 14:47:21 srv1 kernel: i6300esb: Intel 6300ESB WatchDog Timer Driver v0.05
Dec  7 14:47:21 srv1 kernel: i6300esb: initialized (0xffffc90000128000). heartbeat=30 sec (nowayout=0)

NOTE: in your test environment, you could use a software watchdog from the Linux kernel called softdog. This is fine as far as this is just for demo purpose or very last possible solution. You should definitely rely on a hardware watchdog which is not tied to the operating system.

First we need to stop the cluster to set everything up. The watchdog capability is detected by the cluster manager on each node during the cluster startup.

# pcs cluster stop --all

Install and enable sbd. This small deamon is the glue between the watchdog device and the inter-communication with Pacemaker:

# yum install -y sbd
# systemctl enable sbd.service

Edit /etc/sysconfig/sbd and make sure you have:

You can adjust the value of SBD_WATCHDOG_TIMEOUT to suit your need. This last variable is the time sbd will use to initialize the recurrent hardware watchdog timer.

We can now restart the cluster to set everything up. The watchdog capability is detected by the cluster manager on each node during the cluster startup. Start the cluster:

# pcs cluster start --all

After some seconds, the following command should return true:

# pcs property show | grep have-watchdog
 have-watchdog: true

Adjust the stonith-watchdog-timeout cluster property:

# pcs property set stonith-watchdog-timeout=10s

A good value for stonith-watchdog-timeout is the double of SBD_WATCHDOG_TIMEOUT.

Now, if you kill the sbd process, the node should reset itself in less than SBD_WATCHDOG_TIMEOUT seconds:

# killall -9 sbd

Using the following command should ask the remote node to fence itself using its watchdog (if no other fencing device exist):

# stonith_admin -F srv1

If you stop Pacemaker but not Corosync or simulate a resource failing to stop or a resource fatal error, the node should fence itself immediately.

Forbidding a PAF resource on a node

In this chapter, we need to set up a node where no PostgreSQL instance of your cluster is supposed to run. That might be that PostgreSQL is not installed on this node, the instance is part of a different resource cluster, etc.

The following command forbid your multi-state PostgreSQL resource called pgsql-ha to run on node called srv3:

# pcs constraint location pgsql-ha rule resource-discovery=never score=-INFINITY \#uname eq srv3

This creates constraint location associated to a rule allowing us to avoid ( score=-INFINITY ) the node srv3 ( \#uname eq srv3 ) for resource pgsql-ha. The resource-discovery=never is mandatory here as it forbid the “probe” action the CRM is usually running to discovers the state of a resource on a node. On a node where your PostgreSQL cluster is not running, this “probe” action will fail, leading to bad cluster reactions.

Adding IPs on standbys nodes

In this chapter, we are using a three node cluster with one PostgreSQL primary instance and two standbys instances.

As usual, we start from the cluster created in the quick start documentation:

See the Quick Start CentOS 7 for more informations.

We want to create two IP addresses with the following properties:

To make this possible, we have to play with the resources co-location scores.

First, let’s add two IPaddr2 resources called pgsql-ip-stby1 and pgsql-ip-stby2 holding IP addresses and

# pcs resource create pgsql-ip-stby1 ocf:heartbeat:IPaddr2  \
  cidr_netmask=24 ip= op monitor interval=10s \

# pcs resource create pgsql-ip-stby2 ocf:heartbeat:IPaddr2  \
  cidr_netmask=24 ip= op monitor interval=10s \

We want both IP addresses to avoid co-locating with each other. We add a co-location constraint so pgsql-ip-stby2 avoids pgsql-ip-stby1 with a score of -20 (higher than the stickiness of the cluster):

# pcs constraint colocation add pgsql-ip-stby2 with pgsql-ip-stby1 -20

NOTE: that means the cluster manager have to start pgsql-ip-stby1 first to decide where pgsql-ip-stby2 should start according to the new scores in the cluster. Also, that means that whenever you move pgsql-ip-stby1 to another node, the cluster might have to stop pgsql-ip-stby2 first and restart it elsewhere depending on new scores.

Now, we add similar co-location constraints to define that each IP address prefers to run on a node with a standby of pgsql-ha:

We give higher priority to the standbys with the 100 score, but should they all be stopped, the 50 score push the IP to move to the primary.

# pcs constraint colocation add pgsql-ip-stby1 with slave pgsql-ha 100
# pcs constraint order start pgsql-ha then start pgsql-ip-stby1 kind=Mandatory

# pcs constraint colocation add pgsql-ip-stby2 with slave pgsql-ha 100
# pcs constraint order start pgsql-ha then start pgsql-ip-stby2 kind=Mandatory

# pcs constraint colocation add pgsql-ip-stby1 with pgsql-ha 50
# pcs constraint colocation add pgsql-ip-stby2 with pgsql-ha 50