PostgreSQL Automatic Failover

High-Availibility for Postgres, based on Pacemaker and Corosync.

Configuration

PostgreSQL Automatic Failover agent is a Pacemaker’s multi-state resource agent, developped using the OCF specification. As such, it allows several parameters to be set during the configuration process.

Please note that we assume you are already familiar with both Pacemaker and PostgreSQL installation and configuration. In consequence, we only describe in this section the PAF related parameters.

Table of contents:

PostgreSQL configuration

Before configuring the resource agent, PostgreSQL must be installed on all the nodes it is supposed to run on, and be propertly configured to allow streaming replication between nodes.

For more details about how to configure streaming replication with PostgreSQL, please refer to the official documentation.

With PostgreSQL 11 and before, it requires a template of the recovery.conf file ready to use on all the nodes. You have to know that every single PostgreSQL instance will be started as a standby before one of them is picked by Pacemaker to be promoted as the primary. Moreover, your cluster will move with switchovers, failovers, upgrade, etc. In short, each node should be able to be a standby. You can create such a template file suitable to your needs, the only requirements are:

If you are using PostgreSQL 12 and after, you don’t need this template file. Just set the primary_conninfo parameter in the main configuration file. Note the recovery_target_timeline parameter is already set to latest by default.

Moreover, if you rely on Pacemaker to move an IP resource on the node hosting the primary instance of PostgreSQL, make sure to add rules on the pg_hba.conf file of each instance to forbid self-replication.

It is advised to put all these setups outside the $PGDATA to ease the procedure to rebuild standby without having to edit configuration files. Use include family parameters and eg. hba_file.

Last but not least, during the very first startup of your cluster, the designated primary will be the only instance stopped gently as a primary. Take great care of this when you setup your cluster for the first time.

PAF resource configuration

These parameters are used by the agent to propertly manage the PostgreSQL resource. They are usually set up during the resource creation process.

For more information about how to create and configure a multi-state resource with Pacemaker, please refer to the project’s official documentation.

Resource agent parameters

These parameters are specific to the PAF resource agent, and usually should be modified depending on the specificities of your installation.

Resource agent actions

When creating a resource, one can (and should) specify several optionnal parameters for every action the resource agent supports. This section lists the actions supported by the PAF resource agent, and the minimum suggested value.

They are by no mean the default values, so when configuring the resource you should always explicitely specify the values that fit your context. If you don’t know what values to chose, the ones mentionned here are a good value to start with.

Please note that these actions are for the internal use of Pacemaker only, and it will use every value you configure here (like how much time it has to wait before deciding a resource has failed to stop, and fence the node instead).

Multi-state resource parameters

After creating your PostgreSQL resource in previous chapter, you need to create the specific promotable resource that will clone the previous resource on several nodes, using two different roles Master and Slave.

Here are the parameter for such resources:

Other considerations

Creating a working Pacemaker’s cluster will usually involves much more configuration than just the PostgreSQL instances and resources. For example, having a Pacemaker’s managed IP that is always up on the primary PostgreSQL resource seems like a good idea. And obviously, you also have to configure fencing.

This additionnal configuration steps are out of the scope of this document, so you should refer to the official Pacemaker’s documentation.

Full examples

See the Quick starts for full examples of cluster, resource creation and configuration:

Cookbooks

Some cookbooks are available to help you manage your cluster. See: