This article describes how to configure, monitor the health of, and troubleshoot TIE Server PostgreSQL database replication.
NOTE: As of TIE Server version 2.1.0, the naming convention for Master and Slave operations changed to Primary and Secondary. For example:
Master becomes Primary
Slave becomes Secondary
Previous versions of TIE Server retain the original Master/Slave designations.
Look at the current TIE Server primary and secondary PostgreSQL database configuration:
The configuration file for TIE Server PostgreSQL database replication is
/data/tieserver_pg/postgresql.conf. This file contains replication configuration parameters such as
wal_level,
archive_mode, archive_command, max_wal_senders, wal_keep_segments, and
hot_standby. For information about these configuration parameters, see the
official PostgreSQL documentation.
We recommend that you do
not change the configuration. It allows the TIE Server
PostgreSQL database to use native streaming replication in a primary or secondary configuration. The primary or secondary configurations might have hot-standby capabilities that enable scalability for read operations. You can tune the parameters below, as needed to address unusual delays:
- max_wal_senders: To be set in the primary instance. Specifies the maximum number of possible concurrent connections from the secondary server to the primary server. Increasing this value might speed up the synchronization process. It can't be set higher than the overall max_connections parameter configuration.
- wal_keep_segments: To be set in the primary instance. Specifies the minimum number of past log file segments kept in the pg_xlog directory. Increasing this value helps to prevent the primary server from removing a WAL segment still needed by the standby, in which case the replication connection closes. This situation is potentially possible when the Data Exchange Layer (DXL) architecture has nodes with slow connectivity. As a result, the PostgreSQL replication process is slow.
After changing these values, you must perform further monitoring and testing to assess the overall impact. Factors besides each environment link latency include:
- Number of brokers
- Number of endpoints
- Mix of files running on each endpoint image
Monitor TIE Server PostgreSQL database replication:
Log on as a "Super User" and execute the
replication-monitoring.sh script that's available in the home directory. The script is intended to help administrators perform the following:
- Understand the status of the replication in a secondary server.
- Recover the secondary server if the replication process fails by reconfiguring it.
Use: replication-monitoring.sh <Options>
Options:
- -c — Command to execute:
- monitor — Shows information about the replication status (see the Monitor information table for an explanation of the possible output fields).
- reset — Reconfigures the replication secondary server.
- autoreset — Reconfigures the replication secondary server when the gap between the primary server and secondary server reaches a threshold (1-megabyte difference by default).
- -t — Threshold in bytes to determine when the replication secondary server is reconfigured. By default, this value is 1048576 bytes (1 megabyte).
- --color — Include colors in the output. It makes the output easier to read.
- --help — Show help.
Examples:
- replication-monitoring.sh -c monitor — Returns the current secondary server replication status.
- replication-monitoring.sh -c monitor --color — Returns the current secondary server replication status using colors to make it easier to read.
- replication-monitoring.sh -c reset — Reconfigures the secondary server. It creates a fresh database backup from the primary server.
- replication-monitoring.sh -c autoreset -t 2097152 — Reconfigures the secondary server when the secondary server replication falls behind by 2 MB compared with the primary server. This validation is made only once when the script is run. (It doesn't set a trigger if the threshold is reached in the future.) So, to validate again, you must run the command again.
Field |
Description |
Is Replication in progress |
Returns True if the server is set as a secondary server and the replication is properly configured. |
Is Replication paused |
It's possible to pause the replication. It returns True if the replication is paused. |
Last xLog receive location |
Last replication log received in the secondary server. Logs can be received, but not replayed yet. |
Current xLog location in server |
Current replication log on the primary server. To get this value, the script executes a remote query to the primary server. |
Gap between Server and Receive |
The difference in bytes between the primary server and last received log on the secondary server. It helps to determine by how much the replication process is behind. |
Last xLog replay location |
Last replication log replayed in the secondary server. |
Gap between Server and Replay |
The difference in bytes between the primary server and secondary server. It's probably the most important value to determine the health of the replication process. |
Last xLog replay time |
The time that the last log was replayed in the secondary server.
If there's no activity in the primary server, the replication activity doesn't occur in the secondary server and the value remains the same. The value being the same doesn't mean that the replication fails. |
Also, a new
Health Status section is added to reflect the Secondary's database replication status. To view the health status, go to the
Menu, Server Settings, TIE Server Topology Management page.
Troubleshooting TIE Server PostgreSQL database replication:
How do I reconfigure replication in a secondary server?
Run the following command:
/usr/sbin/replication-monitoring.sh -c reset
How do I check whether the replication is working?
Run the following command:
/usr/sbin/replication-monitoring.sh -c monitor. Then, validate whether the value of
Gap between Server and Replay is 0 bytes or close to 0.
What do the following log entries mean?
- FATAL: Could not connect to the primary server: FATAL: No pg_hba.conf entry for replication connection from host "xx.xx.xx.xx", user "rep", SSL on
The replication user configuration is incorrect. Validate the permissions in the file /data/tieserver_pg/pg_hba.conf on the primary server. There must be an entry similar to the following:
hostssl replication rep xx.xx.xx.xx/xx cert clientcert=1 map=rep
Here, xx.xx.xx.xx/xx must be the IP address and bitmask of the server to which the primary server is replicating.
Also, there must be an entry similar to the following in the file /data/tieserver_pg/pg_ident.conf:
rep xx.xx.xx.xx rep
Here, xx.xx.xx.xx must be the IP address of the server to which the primary server is replicating.
- FATAL: Could not receive data from WAL stream:
There are connectivity issues between the secondary server and primary server.
- FATAL: Could not receive data from WAL stream: ERROR: Requested WAL segment xxxxxxxxxxxxxxxxxxxxxxxx has already been removed
The primary server might have removed a WAL segment still needed by the secondary server. To fix this issue, run the command /usr/sbin/replication-monitoring.sh -c reset to reconfigure the secondary server and start with a fresh backup.