Configuring High Availability in Tanzu Postgres

This topic describes how to enable high availability for Tanzu Postgres. High availability offers automatic failover ensuring that any application requests operate continuously and without downtime.

Tanzu Postgres uses the pg_auto_failover extension (version 1.4) to provide a highly available Tanzu Postgres cluster on Kubernetes. For detailed information about pg_auto_failover features, see the pg_auto_failover documentation.

HA cluster

In the Tanzu Postgres high availability (HA) cluster configuration, the topology consists of three pods: one monitor, one primary and one hot standby mirror. pg_auto_failover ensures that the data is synchronously replicated from the primary to the mirror node. If the primary node is unresponsive, the application requests are re-directed to the mirror node, which gets promoted to the primary. All application requests continue to the promoted primary, while a new postgres instance is started which becomes the new mirror. If the monitor pod fails, operations continue as normal. The Postgres operator redeploys a monitor pod, and when ready it resumes monitoring of the primary and secondary.

Note: The high availablity cluster mode is currently supported only in a single Availability Zone.

Configuring High Availability

Ensure that you have completed the Installing Tanzu Postgres and Creating a Postgres Operator Release procedures before proceeding. Also review Deploying a New Postgres Instance.

  1. To enable Tanzu Postgres high availability (cluster mode), edit your copy of the instance yaml file and alter the highAvailability field:

    apiVersion: sql.tanzu.vmware.com/v1
    kind: Postgres
    metadata:
      name: my-postgres-ha
    spec:
      memory: 800Mi
      cpu: "0.8"
      storageClassName: standard
      storageSize: 10G
      serviceType: LoadBalancer
      pgConfig:
       dbname:
       username:
      highAvailability:
       Enabled: true
    

    highAvailability values can be Enabled: <true|false>. If this field is left empty, the postgres instance is by default a single node configuration.

  2. Execute this command to deploy or redeploy the cluster with the new highAvailability setting:

    $ kubectl apply -f my-postgres-ha.yaml
    

    where my-postgres-ha.yamlis the Kubernetes manifest created for this instance.

    The command output is similar to:

    postgres.sql.tanzu.vmware.com/my-postgres created
    

    where my-postgres is the Postgres instance name defined in the yaml file.

    At this point, the Postgres operator deploys the three Postgres instance pods: the monitor, the primary, and the mirror.

Verifying the HA Configuration

To confirm your HA configuration is ready for access, use kubectl get to review the STATUS field and confirm that it shows “Running”. Initially STATUS will show Created, until all artifacts are deployed. Use Ctr-C to escape the watch command.

$ watch kubectl get postgres/my-postgres-ha
NAME                STATUS          AGE
my-postgres-ha      Running         55s

To view the created pods, use:

$ kubectl get pods
NAME                                 READY   STATUS     RESTARTS      AGE
pod/my-postgres-ha-0                 1/1     Running       0          11m
pod/my-postgres-ha-monitor           1/1     Running       0          12m
pod/my-postgres-ha-1                 1/1     Running       0          4m28s

You can now log into the primary pod using kubectl exec -it <pod-name> -- bash:

$ kubectl exec -it <my-postgres-ha-0> -- bash

You can log into any pod with kubectl exec and use the pg_autoctl tool to inspect the state of the cluster. Run pg_autoctl show state to see which pod is currently the primary:

$ kubectl exec -ti pod/my-postgres-ha-1 -- pg_autoctl show state

Name   | Node |                                                      Host:Port |    LSN  | Reachable|  Current State | Assigned State
-------+-------+---------------------------------------------------------------+----------+-----------+----------+-----------------
node_1 |    1 | my-postgres-1.my-postgres-agent.default.svc.cluster.local:5432 | 0/501B5C0 |    yes |   primary   |    primary
node_2 |    2 | my-postgres-0.my-postgres-agent.default.svc.cluster.local:5432 | 0/501B5C0 |    yes |   secondary |    secondary

The pg_autoctl set of commands manage the pg_autofailover services. For further information, refer to the pg_autoctl reference documentation.

Note: VMware supports a limited range of pg_autoctl commands, involving inspecting the nodes and performing a manual failover.

If the primary is unreachable, during the primary to mirror failover, the Current State and Assigned State status columns toggle between demoted, catchingup, wait_primary, secondary, and primary. You can monitor the states using pg_autoctl:

$ watch pg_autoctl show state
Name      |   Port | Group |  Node |     Current State |    Assigned State
----------+--------+-------+-------+-------------------+------------------
127.0.0.1 |   6010 |     0 |     1 |           demoted |        catchingup
127.0.0.1 |   6011 |     0 |     2 |      wait_primary |      wait_primary
Name      |   Port | Group |  Node |     Current State |    Assigned State
----------+--------+-------+-------+-------------------+------------------
127.0.0.1 |   6010 |     0 |     1 |         secondary |         secondary
127.0.0.1 |   6011 |     0 |     2 |           primary |           primary