a developer: PostgreSQL Replication

WAL - Write Ahead Log, or xlog, or transaction log.

WAL 就像是 Cassandra 的 CommitLog, 會先被存起來, 再寫進資料庫, 使 Postgres 不管何時被關閉, 重啟後都可以恢復資料.
WAL 存在 pg_wal folder 下
Postgres 13, 放在 /var/lib/pgsql/13/data/pg_wal
WAL 檔案預設 16MB
WAL 是用 binary format 寫入的

Checkpoint

用來清除 WAL. 確認 WAL 已經寫入 data 就可以把 WAL 清掉了
由系統自動驅動, 不過可以在 postgresql.conf 裡面設定驅動的參數.
例如可以指定 Checkpoint 之間的週期, 或是 wal 檔案大小的區間.

# - Checkpoints - 
#checkpoint_timeout = 5min              # range 30s-1d 
max_wal_size = 1GB 
min_wal_size = 80MB 
#checkpoint_completion_target = 0.5     # checkpoint target duration, 0.0 - 1.0 
#checkpoint_flush_after = 256kB         # measured in pages, 0 disables 
#checkpoint_warning = 30s               # 0 disables

WAL file 會被排序過再寫到硬碟, 來增加寫入的效能
比較長的 Checkpoint 區間會減少 WAL file 的數量

PITR: point-in-time-recovery

做 HA 需要 PITR backup/restore, 因為 standby node 在剛啟動的時候需要先有一個從 primary node 建立的 base backup
PITR 的設定在 postgresql.conf (擷取部分)

#------------------------------------------------------------------------------ 
# WRITE-AHEAD LOG 
#------------------------------------------------------------------------------
wal_level = replica

#------------------------------------------------------------------------------ 
# REPLICATION 
#------------------------------------------------------------------------------
max_wal_senders = 10
max_replication_slots = 10

wal_level = replica (https://docs.postgresql.tw/server-administration/server-configuration/write-ahead-log) 預設 replica 使能夠準備足夠的 transaction log 給 PITR restore 使用, 如果設定成
max_wal_senders = 10 (https://docs.postgresql.tw/server-administration/server-configuration/replication) 最大的寫量, 注意不能大於 max_connection
max_replication_slots = 10 (https://docs.postgresql.tw/server-administration/server-configuration/replication) 指定最大的 replication slot, 就是一份寫入的資料, 要 replica 到幾個 node 才可以 (或是 replica 到所有的 standby)

Archive transaction log

設定在 postgresql.conf

# - Archiving - 
archive_mode = on               # enables archiving; off, on, or always 
                                # (change requires restart) 
archive_command = 'cp "%p" "/var/lib/pgsql/archivedir/%f"'              # command to use to archive a logfile segment 
                                # placeholders: %p = path of file to archive 
                                #               %f = file name only 
                                # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f' 
#archive_timeout = 0            # force a logfile segment switch after this 
                                # number of seconds; 0 disables

以這預設值, Postgres 會持續把 WAL 寫入 /var/lib/pgsql/archivedir/%f

[admin@rnd1 ~]$ sudo ls -l /var/lib/pgsql/13/data/pg_wal 
total 32772 
-rw------- 1 postgres postgres      337 Jun  9 08:40 000000010000000000000002.00000028.backup 
-rw------- 1 postgres postgres 16777216 Jun 10 04:21 000000010000000000000003 
-rw------- 1 postgres postgres 16777216 Jun  9 08:40 000000010000000000000004 
drwx------ 2 postgres postgres       59 Jun  9 08:45 archive_status 
[admin@rnd1 ~]$ sudo ls -l /var/lib/pgsql/archivedir/ 
total 49156 
-rw------- 1 postgres postgres 16777216 Jun  9 08:40 000000010000000000000001 
-rw------- 1 postgres postgres 16777216 Jun  9 08:40 000000010000000000000002 
-rw------- 1 postgres postgres      337 Jun  9 08:40 000000010000000000000002.00000028.backup 
-rw------- 1 postgres postgres 16777216 Jun  9 08:32 000000010000000000000003 
[admin@rnd1 ~]$

pg_hba.conf (host-based authentication)

如果要用 pg_basebackup 就需要設定 pg_hba.conf

local         database  user  auth-method [auth-options] 
host          database  user  address     auth-method  [auth-options] 
hostssl       database  user  address     auth-method  [auth-options] 
hostnossl     database  user  address     auth-method  [auth-options] 
hostgssenc    database  user  address     auth-method  [auth-options] 
hostnogssenc  database  user  address     auth-method  [auth-options] 
host          database  user  IP-address  IP-mask      auth-method  [auth-options] 
hostssl       database  user  IP-address  IP-mask      auth-method  [auth-options] 
hostnossl     database  user  IP-address  IP-mask      auth-method  [auth-options] 
hostgssenc    database  user  IP-address  IP-mask      auth-method  [auth-options] 
hostnogssenc  database  user  IP-address  IP-mask      auth-method  [auth-options]

Ex. 從 local or 127.0.0.1 or ::1 來的, 用來 replication 的 user 為 postgres 的連線都一律通過

# Allow replication connections from localhost, by a user with the 
# replication privilege. 
local   replication   postgres                  trust 
host    replication   postgres   127.0.0.1/32   trust 
host    replication   postgres   ::1/128        trust

https://www.postgresql.org/docs/13/auth-pg-hba-conf.html

執行 pg_basebackup

https://www.postgresql.org/docs/13/app-pgbasebackup.html
這段指令會把 Postgres 的資料複製到 /some_target_dir 下

pg_basebackup -D /some_target_dir -h localhost --checkpoint=fast --wal-method=stream 

PITR restore

指定 restore_command 與 recovery_target_timeline

restore_command = 'cp /mnt/server/archivedir/%f "%p"'
recovery_target_timeline = latest

啟動後等關鍵字 "consistent recovery state reached"
Ex. 用 pg_basebackup 要求 standby 去 sync data

pg_basebackup -D /target  -h master.example.com  --checkpoint=fast  --wal-method=stream -R 

使用 pg_basebackup 的時候可以 -R, 如此會把 standby configuration 寫進 postgresql.conf

standby_mode = on 
primary_conninfo = ' ...' 

Debug

pg_stat_replication ( *_lsn, lsn 是 location 的意思, 所以 sent_lsn 就是 sent_location )

select * from pg_stat_replication; 
  pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_l 
ag | sync_priority | sync_state |          reply_time 
-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+-----------+--------- 
---+---------------+------------+------------------------------- 
 13594 |    16385 | repl    | 10.206.79.240    | 10.206.79.240 |                 |       50462 | 2021-06-09 10:21:35.997181+00 |              | streaming | 0/30255F8 | 0/30255F8 | 0/30255F8 | 0/30255F8  |           |           | 
   |             0 | async      | 2021-06-13 18:06:50.870681+00 
(1 row)

pg_stat_wal_receiver (可以在 standby 查)

select * from pg_stat_wal_receiver; 
 pid  |  status   | receive_start_lsn | receive_start_tli | written_lsn | flushed_lsn | received_tli |      last_msg_send_time       |     last_msg_receipt_time     | latest_end_lsn |        latest_end_time        |   slot_name   |  send 
er_host  | sender_port |                                                                                                                                                       conninfo 
------+-----------+-------------------+-------------------+-------------+-------------+--------------+-------------------------------+-------------------------------+----------------+-------------------------------+---------------+------ 
---------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
-------------------------------------------------------------------------------------------------- 
 1598 | streaming | 0/3000000         |                 1 | 0/30255F8   | 0/30255F8   |            1 | 2021-06-13 18:12:41.476929+00 | 2021-06-13 18:12:41.477835+00 | 0/30255F8      | 2021-06-10 04:21:07.190989+00 | 10_206_79_240 | 10.20 
6.79.197 |        5432 | user=repl passfile=/var/lib/pgsql/.pgpass channel_binding=prefer dbname=replication host=10.206.79.197 port=5432 application_name=10.206.79.240 fallback_application_name=walreceiver sslmode=prefer sslcompression= 
0 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres target_session_attrs=any 
(1 row)

Timeline

一開始 Postgres primary 的 timeline 是 1, transaction log 的檔名像是0000000100000000000000F5
當 standby 被 promote 了, timeline 會變成 2, transaction log 的檔名像是0000000200000000000000F5

hot_standby_feedback = off

由於 primary 與 standby 的狀態是單方面從 primary sync to standby, 所以如果 standby 做了一些 transaction 相關的事情, 會因為 primary 同時間也有動作而產生錯誤.
為了能夠修正這個錯誤, 透過 hot_standby_feedback = on, 使 standby 可以定期傳"最後一筆 transaction log" 給 primary, 如此 primary 就有機會可以補救 (例如因為注意到 standby 的 transaction log 有差距而避免 delete data)
不過為了避免 standby 長時間的 transaction 影響效能, hot_standby_feedback 是被關掉的 (相較於 transaction log streaming 效能好)
anyway, 如果在 OLTP 的情況下, select 花很久時間, 就可以考慮改這個設定

a developer

PostgreSQL Replication

沒有留言:

張貼留言

別名演算法 Alias Method

Visit

Translate