PostgreSQL Replication

 

  • WAL - Write Ahead Log, or xlog, or transaction log.
    • WAL 就像是 Cassandra 的 CommitLog, 會先被存起來, 再寫進資料庫, 使 Postgres 不管何時被關閉, 重啟後都可以恢復資料.
    • WAL 存在 pg_wal folder 下
    • Postgres 13, 放在 /var/lib/pgsql/13/data/pg_wal
    • WAL 檔案預設 16MB
    • WAL 是用 binary format 寫入的
  • Checkpoint
    • 用來清除 WAL. 確認 WAL 已經寫入 data 就可以把 WAL 清掉了
    • 由系統自動驅動, 不過可以在 postgresql.conf 裡面設定驅動的參數.
      例如可以指定 Checkpoint 之間的週期, 或是 wal 檔案大小的區間.
# - Checkpoints -
#checkpoint_timeout = 5min # range 30s-1d
max_wal_size = 1GB
min_wal_size = 80MB
#checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0
#checkpoint_flush_after = 256kB # measured in pages, 0 disables
#checkpoint_warning = 30s # 0 disables
    • WAL file 會被排序過再寫到硬碟, 來增加寫入的效能
    • 比較長的 Checkpoint 區間會減少 WAL file 的數量
  • PITR: point-in-time-recovery
    • 做 HA 需要 PITR backup/restore, 因為 standby node 在剛啟動的時候需要先有一個從 primary node 建立的 base backup
    • PITR 的設定在 postgresql.conf (擷取部分)
#------------------------------------------------------------------------------
# WRITE-AHEAD LOG
#------------------------------------------------------------------------------
wal_level = replica

#------------------------------------------------------------------------------
# REPLICATION
#------------------------------------------------------------------------------
max_wal_senders = 10
max_replication_slots = 10
    • wal_level = replica (https://docs.postgresql.tw/server-administration/server-configuration/write-ahead-log) 預設 replica 使能夠準備足夠的 transaction log 給 PITR restore 使用, 如果設定成
    • max_wal_senders = 10 (https://docs.postgresql.tw/server-administration/server-configuration/replication) 最大的寫量, 注意不能大於 max_connection
    • max_replication_slots = 10 (https://docs.postgresql.tw/server-administration/server-configuration/replication) 指定最大的 replication slot, 就是一份寫入的資料, 要 replica 到幾個 node 才可以 (或是 replica 到所有的 standby)
  • Archive transaction log
    • 設定在 postgresql.conf
# - Archiving -
archive_mode = on # enables archiving; off, on, or always
# (change requires restart)
archive_command = 'cp "%p" "/var/lib/pgsql/archivedir/%f"' # command to use to archive a logfile segment
# placeholders: %p = path of file to archive
# %f = file name only
# e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
#archive_timeout = 0 # force a logfile segment switch after this
# number of seconds; 0 disables
    • 以這預設值, Postgres 會持續把 WAL 寫入 /var/lib/pgsql/archivedir/%f
[admin@rnd1 ~]$ sudo ls -l /var/lib/pgsql/13/data/pg_wal
total 32772
-rw------- 1 postgres postgres 337 Jun 9 08:40 000000010000000000000002.00000028.backup
-rw------- 1 postgres postgres 16777216 Jun 10 04:21 000000010000000000000003
-rw------- 1 postgres postgres 16777216 Jun 9 08:40 000000010000000000000004
drwx------ 2 postgres postgres 59 Jun 9 08:45 archive_status
[admin@rnd1 ~]$ sudo ls -l /var/lib/pgsql/archivedir/
total 49156
-rw------- 1 postgres postgres 16777216 Jun 9 08:40 000000010000000000000001
-rw------- 1 postgres postgres 16777216 Jun 9 08:40 000000010000000000000002
-rw------- 1 postgres postgres 337 Jun 9 08:40 000000010000000000000002.00000028.backup
-rw------- 1 postgres postgres 16777216 Jun 9 08:32 000000010000000000000003
[admin@rnd1 ~]$
  • pg_hba.conf (host-based authentication)
    • 如果要用 pg_basebackup 就需要設定 pg_hba.conf
local database user auth-method [auth-options]
host database user address auth-method [auth-options]
hostssl database user address auth-method [auth-options]
hostnossl database user address auth-method [auth-options]
hostgssenc database user address auth-method [auth-options]
hostnogssenc database user address auth-method [auth-options]
host database user IP-address IP-mask auth-method [auth-options]
hostssl database user IP-address IP-mask auth-method [auth-options]
hostnossl database user IP-address IP-mask auth-method [auth-options]
hostgssenc database user IP-address IP-mask auth-method [auth-options]
hostnogssenc database user IP-address IP-mask auth-method [auth-options]
Ex. 從 local or 127.0.0.1 or ::1 來的, 用來 replication 的 user 為 postgres 的連線都一律通過
# Allow replication connections from localhost, by a user with the
# replication privilege.
local replication postgres trust
host replication postgres 127.0.0.1/32 trust
host replication postgres ::1/128 trust
pg_basebackup -D /some_target_dir -h localhost --checkpoint=fast --wal-method=stream
  • PITR restore
    • 指定 restore_command 與 recovery_target_timeline
restore_command = 'cp /mnt/server/archivedir/%f "%p"'
recovery_target_timeline = latest
    • 啟動後等關鍵字 "consistent recovery state reached"
    • Ex. 用 pg_basebackup 要求 standby 去 sync data
pg_basebackup -D /target -h master.example.com --checkpoint=fast --wal-method=stream -R
    • 使用 pg_basebackup 的時候可以 -R, 如此會把 standby configuration 寫進 postgresql.conf
standby_mode = on
primary_conninfo = ' ...'
  • Debug
    • pg_stat_replication ( *_lsn, lsn 是 location 的意思, 所以 sent_lsn 就是 sent_location )
select * from pg_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_l
ag | sync_priority | sync_state | reply_time
-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+-----------+---------
---+---------------+------------+-------------------------------
13594 | 16385 | repl | 10.206.79.240 | 10.206.79.240 | | 50462 | 2021-06-09 10:21:35.997181+00 | | streaming | 0/30255F8 | 0/30255F8 | 0/30255F8 | 0/30255F8 | | |
| 0 | async | 2021-06-13 18:06:50.870681+00
(1 row)
    • pg_stat_wal_receiver (可以在 standby 查)
select * from pg_stat_wal_receiver;
pid | status | receive_start_lsn | receive_start_tli | written_lsn | flushed_lsn | received_tli | last_msg_send_time | last_msg_receipt_time | latest_end_lsn | latest_end_time | slot_name | send
er_host | sender_port | conninfo
------+-----------+-------------------+-------------------+-------------+-------------+--------------+-------------------------------+-------------------------------+----------------+-------------------------------+---------------+------
---------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
1598 | streaming | 0/3000000 | 1 | 0/30255F8 | 0/30255F8 | 1 | 2021-06-13 18:12:41.476929+00 | 2021-06-13 18:12:41.477835+00 | 0/30255F8 | 2021-06-10 04:21:07.190989+00 | 10_206_79_240 | 10.20
6.79.197 | 5432 | user=repl passfile=/var/lib/pgsql/.pgpass channel_binding=prefer dbname=replication host=10.206.79.197 port=5432 application_name=10.206.79.240 fallback_application_name=walreceiver sslmode=prefer sslcompression=
0 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres target_session_attrs=any
(1 row)
  • Timeline
    • 一開始 Postgres primary 的 timeline 是 1, transaction log 的檔名像是0000000100000000000000F5
    • 當 standby 被 promote 了, timeline 會變成 2, transaction log 的檔名像是0000000200000000000000F5
  • hot_standby_feedback = off
    • 由於 primary 與 standby 的狀態是單方面從 primary sync to standby, 所以如果 standby 做了一些 transaction 相關的事情, 會因為 primary 同時間也有動作而產生錯誤.
    • 為了能夠修正這個錯誤, 透過 hot_standby_feedback = on, 使 standby 可以定期傳"最後一筆 transaction log" 給 primary, 如此 primary 就有機會可以補救 (例如因為注意到 standby 的 transaction log 有差距而避免 delete data)
    • 不過為了避免 standby 長時間的 transaction 影響效能, hot_standby_feedback 是被關掉的 (相較於 transaction log streaming 效能好)
    • anyway, 如果在 OLTP 的情況下, select 花很久時間, 就可以考慮改這個設定

沒有留言:

張貼留言

別名演算法 Alias Method

 題目 每個伺服器支援不同的 TPM (transaction per minute) 當 request 來的時候, 系統需要馬上根據 TPM 的能力隨機找到一個適合的 server. 雖然稱為 "隨機", 但還是需要有 TPM 作為權重. 解法 別名演算法...