博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
postgresql 高可用 repmgr 的使用之五 1 Primary + 1 Standby 的 manual failover,node rejoin
阅读量:4549 次
发布时间:2019-06-08

本文共 6535 字,大约阅读时间需要 21 分钟。

os:ubunbu 16.04

postgresql:9.6.8
repmgr:4.1.1

192.168.56.101 node1

192.168.56.102 node2

操作前/etc/repmgr.conf 的内容

node1 节点上的文件内容,node2 节点上类似

$ cat /etc/repmgr.conf node_id=1node_name=node1conninfo='host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2'data_directory='/var/lib/postgresql/9.6/main'use_replication_slots=truepg_bindir='/usr/lib/postgresql/9.6/bin'service_start_command   = 'sudo pg_ctlcluster 9.6 main start'service_stop_command    = 'sudo pg_ctlcluster 9.6 main stop'service_restart_command = 'sudo pg_ctlcluster 9.6 main restart'service_reload_command  = 'sudo pg_ctlcluster 9.6 main reload' service_promote_command  = 'sudo pg_ctlcluster 9.6 main promote'

手动关闭主库模拟异常

node1 节点上操作

$ pg_ctl -D /var/lib/postgresql/9.6/main -m fast stop或者$ sudo pg_ctlcluster 9.6 main stop$ repmgr -f /etc/repmgr.conf cluster showERROR: connection to database failed:  could not connect to server: Connection refused	Is the server running on host "192.168.56.101" and accepting	TCP/IP connections on port 5432?DETAIL: attempted to connect using:  user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.101 fallback_application_name=repmgr

node2 节点上操作

$ repmgr -f /etc/repmgr.conf cluster show ID | Name  | Role    | Status        | Upstream | Location | Connection string                                              ----+-------+---------+---------------+----------+----------+----------------------------------------------------------------- 1  | node1 | primary | ? unreachable |          | default  | host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2 2  | node2 | standby |   running     | node1    | default  | host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2WARNING: following issues were detected  - when attempting to connect to node "node1" (ID: 1), following error encountered :"could not connect to server: Connection refused	Is the server running on host "192.168.56.101" and accepting	TCP/IP connections on port 5432?"  - node "node1" (ID: 1) is registered as an active primary but is unreachable

可以看出 node1 的 Status 显示 unreachable

从库提升为主库

现在node1节点的postgresql已经不可用了(手动关闭、进程异常终止、宕机),需要提升node2上的standby 为 master。

node2 节点上操作

$ repmgr -f /etc/repmgr.conf standby promoteNOTICE: promoting standby to primaryDETAIL: promoting server "node2" (ID: 2) using "sudo pg_ctlcluster 9.6 main promote"DETAIL: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to completeNOTICE: STANDBY PROMOTE successfulDETAIL: server "node2" (ID: 2) was successfully promoted to primary

node2 上再次查看

$ repmgr -f /etc/repmgr.conf cluster show ID | Name  | Role    | Status    | Upstream | Location | Connection string                                              ----+-------+---------+-----------+----------+----------+----------------------------------------------------------------- 1  | node1 | primary | - failed  |          | default  | host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2 2  | node2 | primary | * running |          | default  | host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2WARNING: following issues were detected  - when attempting to connect to node "node1" (ID: 1), following error encountered :"could not connect to server: Connection refused	Is the server running on host "192.168.56.101" and accepting	TCP/IP connections on port 5432?"

node1 节点变为新的slave

node1 节点上操作,启动postgresql

# /etc/init.d/postgresql start$ repmgr -f /etc/repmgr.conf cluster show ID | Name  | Role    | Status               | Upstream | Location | Connection string                                              ----+-------+---------+----------------------+----------+----------+----------------------------------------------------------------- 1  | node1 | primary | * running            |          | default  | host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2 2  | node2 | standby | ! running as primary | node1    | default  | host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2WARNING: following issues were detected  - node "node2" (ID: 2) is registered as standby but running as primary

node2 节点上操作

$ repmgr -f /etc/repmgr.conf cluster show ID | Name  | Role    | Status    | Upstream | Location | Connection string                                            ----+-------+---------+-----------+----------+----------+----------------------------------------------------------------- 1  | node1 | primary | ! running |          | default  | host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2 2  | node2 | primary | * running |          | default  | host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2WARNING: following issues were detected  - node "node1" (ID: 1) is running but the repmgr node record is inactive

问题来了,node1、node2查看状态时都有 WARNING 了,接下来需要为node1 的 postgresql 设置新的 master。

node 1 节点上关闭 postgresql

$ sudo pg_ctlcluster 9.6 main stop

使用 repmgr node rejoin 添加到集群里,选项可以使用的是 pg_rewind。

(This can optionally use pg_rewind to re-integrate a node which has diverged from the rest of the cluster, typically a failed primary.)

$ repmgr -f /etc/repmgr.conf node rejoin -d 'host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind --dry-run --verboseNOTICE: using provided configuration file "/etc/repmgr.conf"INFO: prerequisites for using pg_rewind are metINFO: 0 files would have been copied to "/tmp/repmgr-config-archive-pgsql96"INFO: temporary archive directory "/tmp/repmgr-config-archive-pgsql96" deletedINFO: pg_rewind would now be executedDETAIL: pg_rewind command is:  /usr/lib/postgresql/9.6/bin/pg_rewind -D '/var/lib/postgresql/9.6/main' --source-server='host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2'INFO: prerequisites for executing NODE REJOIN are met
$ repmgr -f /etc/repmgr.conf node rejoin -d 'host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind --verboseNOTICE: using provided configuration file "/etc/repmgr.conf"INFO: prerequisites for using pg_rewind are metINFO: 0 files copied to "/tmp/repmgr-config-archive-pgsql96"NOTICE: executing pg_rewindNOTICE: 0 files copied to /var/lib/postgresql/9.6/mainINFO: directory "/tmp/repmgr-config-archive-pgsql96" deletedINFO: deleting "recovery.done"NOTICE: setting node 1's primary to node 2NOTICE: starting server using "sudo pg_ctlcluster 9.6 main start"INFO: demoted primary is pingableINFO: node 1 has attached to its upstream nodeNOTICE: NODE REJOIN successfulDETAIL: node 1 is now attached to node 2

符合预期。

参考:

转载于:https://www.cnblogs.com/ctypyb2002/p/9792868.html

你可能感兴趣的文章
部分安卓机器【小米手机】,文字显示不全
查看>>
条款39: 避免 "向下转换" 继承层次
查看>>
reactor模式
查看>>
158. Valid Anagram【LintCode by java】
查看>>
Python环境搭建
查看>>
局域网连接组件
查看>>
H5移动端页面设计心得分享
查看>>
跨域问题实践总结! 上(JSONP/document.domain/window.name)
查看>>
理解闭包
查看>>
下拉加载的实现
查看>>
前端实现input[type='file']上传图片预览效果
查看>>
keepalived+lvs+usp安装实施文档
查看>>
SDN第五次上机作业
查看>>
this基础(转)
查看>>
LeetCode20.有效的括号 JavaScript
查看>>
ELK详细安装部署
查看>>
js中错综复杂的上下文对象this
查看>>
EmbeddedWebBrowser网站报“Invalid floating point operation”异常的解决方法
查看>>
Linux ALSA声卡驱动之七:ASoC架构中的Codec
查看>>
GPS格式标准
查看>>