This quick start purpose is to help you to build your first cluster to experiment with. It does not implement various good practices related to your system, Pacemaker or PostgreSQL. This quick start alone is not enough. During your journey in building a safe HA cluster, you must train about security, network, PostgreSQL, Pacemaker, PAF, etc. In regard with PAF, make sure to read carefully documentation from https://clusterlabs.github.io/PAF/documentation.html.
This tutorial is based on CentOS 6.10, using Pacemaker 1.1.18, the
and PostgreSQL 9.6.
Table of contents:
To install PAF and PostgreSQL, this tutorial uses the PGDG repository maintained by the PostgreSQL community. Here is how to add it:
yum install -y https://yum.postgresql.org/reporpms/EL-6-x86_64/pgdg-redhat-repo-latest.noarch.rpm
NOTE: you can not install PostgreSQL from official CentOS repository as the supported version is 8.4 and PAF supports PostgreSQL 9.3 and after.
First, we should make sure NetworkManager is disabled. The network setup should NOT be dynamically handled by some external daemons. Only Pacemaker related processes should be able to deal with this:
chkconfig NetworkManager off service NetworkManager stop
The cluster we are about to build includes three servers called
srv3. IP addresses of these servers are
NOTE: It is essential to setup network redundancy, either at system level using eg. bonding or teaming, or at cluster level.
The IP address
pgsql-vip in this tutorial, will be
set on the server hosting the primary PostgreSQL instance.
If the firewall is enabled, we have to allow the network traffic related to the cluster and PostgreSQL to go through:
service iptables start # corosync iptables -I INPUT -p udp -m state --state NEW -m multiport --dports 5404,5405 -j ACCEPT # pcsd iptables -I INPUT -p tcp --dport 2224 -m state --state NEW,ESTABLISHED -j ACCEPT iptables -I OUTPUT -p tcp --sport 2224 -m state --state ESTABLISHED -j ACCEPT # postgres iptables -I INPUT -p tcp --dport 5432 -m state --state NEW,ESTABLISHED -j ACCEPT iptables -I OUTPUT -p tcp --sport 5432 -m state --state ESTABLISHED -j ACCEPT service iptables save
During the cluster setup, we use the node names in various places,
make sure all your server hostnames can be resolved to the correct IPs. We
usually set this in the
192.168.122.50 pgsql-vip 192.168.122.51 srv1 192.168.122.52 srv2 192.168.122.53 srv3
Now, the three servers should be able to ping each others, eg.:
root@srv1:~# for s in srv1 srv2 srv3; do ping -W1 -c1 $s; done| grep icmp_seq 64 bytes from srv1 (192.168.122.51): icmp_seq=1 ttl=64 time=0.028 ms 64 bytes from srv2 (192.168.122.52): icmp_seq=1 ttl=64 time=0.296 ms 64 bytes from srv3 (192.168.122.53): icmp_seq=1 ttl=64 time=0.351 ms
Make sure hostnames are correctly set on each nodes. Eg.:
srv1$ hostname -s srv1
Run this whole chapter on ALL nodes.
Let’s install everything we need: PostgreSQL, Pacemaker, cluster related packages and PAF:
yum install -y postgresql96 postgresql96-contrib postgresql96-server \ pacemaker resource-agents pcs \ fence-agents-all fence-agents-virsh
Finally, we need to install the “PostgreSQL Automatic Failover” (PAF) resource agent:
yum install -y https://github.com/ClusterLabs/PAF/releases/download/v1.1.0/resource-agents-paf-1.1.0-1.noarch.rpm
WARNING: building PostgreSQL standby is not the main subject here. The following steps are **quick and dirty, VERY DIRTY**. They lack of security, WAL retention and so on. Rely on the PostgreSQL documentation for a proper setup.
The resource agent requires the PostgreSQL instances to be already set up, ready to start and standbys ready to replicate. Make sure to setup your primary on your preferred node to host it: during the very first startup of the cluster, PAF detects the primary based on its shutdown status.
Moreover, it requires a
recovery.conf template ready to use. You can create a
recovery.conf file suitable to your needs, the only requirements are:
standby_mode = on
recovery_target_timeline = 'latest'
application_nameset to the node name
Last but not least, make sure each instance is not able to replicate with
itself! A scenario exists where the primary IP address
pgsql-vip will be on
the same node than a standby for a very short lap of time!
pg_hba.confare different on each node. It is best to keep them out of the
$PGDATAso you do not have to deal with them (or worst: forget to edit them) each time you rebuild a standby! We advice you to deal with this using the
hba_fileparameter in your
recovery_templateparameter in PAF for the
Here are some quick steps to build your primary PostgreSQL instance and its
standbys. This quick start considers
srv1 is the preferred primary node.
On the primary:
service postgresql-9.6 initdb su - postgres cd 9.6/data/ cat <<EOP >> postgresql.conf listen_addresses = '*' wal_level = replica max_wal_senders = 10 hot_standby = on hot_standby_feedback = on EOP cat <<EOP >> pg_hba.conf # forbid self-replication host replication postgres 192.168.122.50/32 reject host replication postgres $(hostname -s) reject # allow any standby connection host replication postgres 0.0.0.0/0 trust EOP cat <<EOP > recovery.conf.pcmk standby_mode = on primary_conninfo = 'host=192.168.122.50 application_name=$(hostname -s)' recovery_target_timeline = 'latest' EOP exit service postgresql-9.6 start ip addr add 192.168.122.50/24 dev eth0
Now, on other nodes, clone the primary as standby and configure it:
su - postgres pg_basebackup -h pgsql-vip -D ~postgres/9.6/data/ -X stream -P cd ~postgres/9.6/data/ sed -ri s/srv[0-9]+/$(hostname -s)/ pg_hba.conf sed -ri s/srv[0-9]+/$(hostname -s)/ recovery.conf.pcmk cp recovery.conf.pcmk recovery.conf exit service postgresql-9.6 start
Check your three instances are replicating as expected (in processes, logs,
Finally, make sure to stop the PostgreSQL services everywhere and to disable them, as Pacemaker will take care of starting/stopping everything for you during cluster normal cluster operations. Start with the primary:
service postgresql-9.6 stop chkconfig postgresql-9.6 off
And remove the vIP from
ip addr del 192.168.122.50/24 dev eth0
It is advised to keep Pacemaker off on server boot. It helps the administrator to investigate after a node fencing before Pacemaker starts and potentially enters in a death match with the other nodes. Make sure to disable Corosync as well to avoid unexpected behaviors. Run this on all nodes:
chkconfig corosync off # important! chkconfig pacemaker off
This guide uses the cluster management tool
pcsd to ease the creation and
setup of a cluster. It allows to create the cluster from command line, without
editing configuration files or XML by hands.
pcsd uses the
hacluster system user to work and communicate with other
members of the cluster. We need to set a password to this user so it can
authenticate to other nodes easily. As cluster management commands can be run on
any member of the cluster, it is recommended to set the same password everywhere
to avoid confusions:
Make sure the
pcsd daemon is enabled and started on all nodes:
chkconfig pcsd on service pcsd start
Now, authenticate each node to the other ones using the following command:
pcs cluster auth srv1 srv2 srv3 -u hacluster
pcs cli tool is able to create and start the whole cluster for us. From
one of the nodes, run the following command:
pcs cluster setup --name cluster_pgsql srv1 srv2 srv3
NOTE: Make sure you have a redundant network at system level. This is a **CRITICAL** part of your cluster. If you have second interfaces not in bonding or teaming already (prefered method), you can add them to the cluster setup using eg.:
pcs cluster setup --name cluster_pgsql srv1,srv1-alt srv2,srv2-alt srv3,srv3-alt
If your version of
pcsdoes not support it (ie. CentOS 6.6 and bellow), you can fallback on the old but useful
pcs cluster setup --name cluster_pgsql srv1 srv2 srv3 ccs -f /etc/cluster/cluster.conf --addalt srv1 srv1-alt ccs -f /etc/cluster/cluster.conf --addalt srv2 srv2-alt ccs -f /etc/cluster/cluster.conf --addalt srv3 srv3-alt pcs cluster sync
This command creates the
/etc/corosync/corosync.conf file and propagate
it everywhere. For more information about it, read the
WARNING: whatever you edit in your
/etc/corosync/corosync.conffile, **ALWAYS** make sure all the nodes in your cluster has the exact same copy of the file. You can use
pcs cluster sync.
You can now start the whole cluster from one node:
pcs cluster start --all
After some seconds of startup and cluster membership stuffs, you should be able
to see your three nodes up in
root@srv1:~# crm_mon -n1D Node srv1: online Node srv2: online Node srv3: online
Now the cluster run, let’s start with some basic setup of the cluster. Run the following command from one node only (the cluster takes care of broadcasting the configuration on all nodes):
pcs resource defaults migration-threshold=5 pcs resource defaults resource-stickiness=10
This sets two default values for resources we create in the next chapter:
resource-stickiness: adds a sticky score for the resource on its current node. It helps avoiding a resource move back and forth between nodes where it has the same score.
migration-threshold: this controls how many time the cluster tries to recover a resource on the same node before moving it on another one.
The most important resource in your cluster is the one able to fence a node. Please, stop reading this quick start and read our fencing documentation page before building your cluster. Take a deep breath, and read: http://clusterlabs.github.com/PAF/fencing.html.
WARNING: I really mean it. You need fencing. PAF is expecting fencing to work in your cluster. Without fencing, you will experience cluster refusing to move anything, even with stonith disabled, or worst, a split brain if you bend it hard enough to make it work anyway. If you don’t mind taking time rebuilding a database with corrupt and/or incoherent data and constraints, that’s fine though.
NOTE: if you can’t have active fencing, look as storage base death or watchdog methods. They are both described in the fencing documentation.
In this tutorial, we choose to create one fencing resource per node to fence.
They are called
fence_vm_xxxand use the fencing agent
to power on or off a virtual machine using the
virsh command through a ssh
connection to the hypervisor.
WARNING: unless you build your PoC cluster using libvirt for VM management, there’s great chances you will need to use a different STONITH agent. The stonith setup is provided as a simple example, be prepared to adjust it.
Now you’ve been warned again and again, let’s populating the cluster with some
sample STONITH resources using virsh over ssh. First, we need to allow ssh
password-less authentication to
<user>@192.168.122.1 so these fencing
resource can work. Again, this is specific to this setup. Depending on your
fencing topology, you might not need this step. Run on all node:
ssh-keygen ssh-copy-id <user>@192.168.122.1
Check the ssh connections are working as expected.
We can now create one STONITH resource for each node. Each fencing
resource will not be allowed to run on the node it is supposed to fence.
Note that in the
port argument of the following commands,
the names of the virutal machines as known by libvirtd side. See manpage
fence_virsh(8) for more infos.
pcs cluster cib fencing.xml pcs -f fencing.xml stonith create fence_vm_srv1 fence_virsh \ pcmk_host_check="static-list" pcmk_host_list="srv1" \ ipaddr="192.168.122.1" login="<user>" port="srv1-c6" \ identity_file="/root/.ssh/id_rsa" pcs -f fencing.xml stonith create fence_vm_srv2 fence_virsh \ pcmk_host_check="static-list" pcmk_host_list="srv2" \ ipaddr="192.168.122.1" login="<user>" port="srv2-c6" \ identity_file="/root/.ssh/id_rsa" pcs -f fencing.xml stonith create fence_vm_srv3 fence_virsh \ pcmk_host_check="static-list" pcmk_host_list="srv3" \ ipaddr="192.168.122.1" login="<user>" port="srv3-c6" \ identity_file="/root/.ssh/id_rsa" pcs -f fencing.xml constraint location fence_vm_srv1 avoids srv1=INFINITY pcs -f fencing.xml constraint location fence_vm_srv2 avoids srv2=INFINITY pcs -f fencing.xml constraint location fence_vm_srv3 avoids srv3=INFINITY pcs cluster cib-push scope=configuration fencing.xml
crm_mon You should see the three resources appearing in your cluster
and being dispatched on nodes.
In this last chapter we create three resources:
pgsqld defines the properties of a PostgreSQL instance: where it is
located, where are its binaries, its configuration files, how to montor it, and
pgsql-ha resource controls all the PostgreSQL instances
pgsqld in your
cluster, decides where the primary is promoted and where the standbys
pgsql-pri-ip resource controls the
pgsql-vip IP address. It is
started on the node hosting the PostgreSQL primary resource.
Now the fencing is working, we can add all other resources and constraints all together in the same time. Create a new offline CIB:
pcs cluster cib cluster1.xml
We add the PostgreSQL
pgsqld resource and the multistate
responsible to clone it everywhere and define the roles (
# pgsqld pcs -f cluster1.xml resource create pgsqld ocf:heartbeat:pgsqlms \ bindir=/usr/pgsql-9.6/bin \ pgdata=/var/lib/pgsql/9.6/data \ op start timeout=60s \ op stop timeout=60s \ op promote timeout=30s \ op demote timeout=120s \ op monitor interval=15s timeout=10s role="Master" \ op monitor interval=16s timeout=10s role="Slave" \ op notify timeout=60s # pgsql-ha pcs -f cluster1.xml resource master pgsql-ha pgsqld notify=true
Note that the values for
interval on each operation are based
on the minimum suggested value for PAF Resource Agent. These values should be
adapted depending on the context.
The last line of this command declare the resource
pgsqld as promotable. The
pgsql-ha resource controls roles of
pgsqld clones in the cluster.
We add the IP address which should be started on the primary node:
pcs -f cluster1.xml resource create pgsql-pri-ip ocf:heartbeat:IPaddr2 \ ip=192.168.122.50 cidr_netmask=24 op monitor interval=10s
We now define the collocation between
start/stop and promote/demote order for these resources must be asymetrical: we
MUST keep the vIP on the primary during its demote process so the
standbies receive everything during the primary shutdown.
pcs -f cluster1.xml constraint colocation add pgsql-pri-ip with master pgsql-ha INFINITY pcs -f cluster1.xml constraint order promote pgsql-ha then start pgsql-pri-ip symmetrical=false kind=Mandatory pcs -f cluster1.xml constraint order demote pgsql-ha then stop pgsql-pri-ip symmetrical=false kind=Mandatory
We can now push our CIB to the cluster, which will start all the magic stuff:
pcs cluster cib-push scope=configuration cluster1.xml
Setup your new node following the first chapters. Stop after the PostgreSQL setup.
On this new node, setup the pcsd deamon and its authentication:
passwd hacluster chkconfig pcsd on service pcsd start pcs cluster auth srv1 srv2 srv3 srv4 -u hacluster
On all other nodes, authenticate to the new node:
pcs cluster auth srv4 -u hacluster
We are now ready to add the new node. Put the cluster in maintenance mode so it does not move resources all over the place when the new node appears:
pcs property set maintenance-mode=true pcs cluster node add srv4,srv4-alt
Or, using the old commands if the syntax of
pcs with alternate interface is
pcs cluster node add srv4 ccs -f /etc/cluster/cluster.conf --addalt srv4 srv4-alt pcs cluster sync
And reload the corosync configuration on all the nodes if needed (it just fails if not needed):
pcs cluster reload corosync
We now need to allow one more clone in the cluster:
pcs resource meta pgsql-ha clone-max=4
Add the STONITH agent for the new node:
pcs stonith create fence_vm_srv4 fence_virsh pcmk_host_check="static-list" pcmk_host_list="srv4" ipaddr="192.168.122.1" login="root" port="srv4-c6" action="off" identity_file="/root/.ssh/id_rsa" pcs constraint location fence_vm_srv4 avoids srv4=INFINITY
And you can now exit your maintenance mode:
pcs property set maintenance-mode=false
Now you know the basics to build a Pacemaker cluster hosting some PostgreSQL instances replicating with each others, you should probably check: