青州網(wǎng)站定制網(wǎng)站排名優(yōu)化推廣
gs_sdr命令代碼解讀
背景
openGauss推出了容災(zāi)架構(gòu),相比之前的一個(gè)集群主從架構(gòu),而容災(zāi)架構(gòu)是兩個(gè)集群間的數(shù)據(jù)同步。為了更深入了解其原理,本文試圖通過閱讀gs_sdr命令相關(guān)的代碼來學(xué)習(xí)下相關(guān)的各種操作。
1.容災(zāi)搭建過程可以參考:https://www.modb.pro/db/628767
2.vscode調(diào)試配置可以參考:https://www.modb.pro/db/658344
3.個(gè)人學(xué)習(xí)記錄,理解不一定完全正確。如有錯(cuò)誤,可指出一起探討_
環(huán)境準(zhǔn)備
安裝集群
安裝兩套集群,每套集群含2個(gè)節(jié)點(diǎn),相關(guān)信息如下:
集群1信息
omm@pghost2 ~$ cm_ctl query -Cvid[ CMServer State ]
node node_ip instance state
---------------------------------------------------------------------
1 pghost2 192.168.56.20 1 /app/ogdata/data/cm/cm_server Primary
2 pghost3 192.168.56.30 2 /app/ogdata/data/cm/cm_server Standby
[ Cluster State ]
cluster_state : Normal
redistributing : No
balanced : Yes
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state
------------------------------------------------------------------------------------------------------------------------------------------------
1 pghost2 192.168.56.20 6001 /app/ogdata/data/dn1 P Primary Normal | 2 pghost3 192.168.56.30 6002 /app/ogdata/data/dn1 S Standby Normal
集群2信息
omm@pghost5 ~$ cm_ctl query -Cvid[ CMServer State ]
node node_ip instance state
---------------------------------------------------------------------
1 pghost5 192.168.56.50 1 /app/ogdata/data/cm/cm_server Primary
2 pghost6 192.168.56.60 2 /app/ogdata/data/cm/cm_server Standby
[ Cluster State ]
cluster_state : Normal
redistributing : No
balanced : Yes
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state
------------------------------------------------------------------------------------------------------------------------------------------------
1 pghost5 192.168.56.50 6001 /app/ogdata/data/dn1 P Primary Normal | 2 pghost6 192.168.56.60 6002 /app/ogdata/data/dn1 S Standby Normal
創(chuàng)建容災(zāi)用戶
在集群1上創(chuàng)建容災(zāi)用戶:
gsql -d postgres -p 26000 -c "create user dr_user with replication password 'oracle_4U';"修改XML配置
修改集群1
修改后的xml配置如下:
修改集群2
修改后的xml配置如下:
將集群1啟動(dòng)為主集群
使用的命令為:
# gs_sdr -t start -m primary -X XMLFILE [-U DR_USERNAME [-W DR_PASSWORD]] [--time-out=SECS]gs_sdr -t start -m primary -X /home/omm/single.xml -U dr_user -W oracle_4U --time-out=86400
vscode調(diào)試配置
{"version": "0.2.0",
"configurations": [
{
"name": "Python: 當(dāng)前文件",
"type": "python",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal",
"justMyCode": true,
"args": ["-t","start","-m","primary","-X","/home/omm/single.xml","-U","dr_user","-W","oracle_4U","--time-out=86400"]
}
]
}
在gs_sdr腳本main函數(shù)中打上斷點(diǎn)
代碼閱讀
判斷是否使用root權(quán)限操作
if os.getuid() == 0:GaussLog.exitWithError(ErrorCode.GAUSS_501["GAUSS_50105"])
# 是root權(quán)限就直接報(bào)錯(cuò)退出
初始化StreamingDisasterRecoveryBase類
base = StreamingDisasterRecoveryBase() # 從集群xml配置文件中加載相 關(guān)的信息base中保存的信息可以參考下圖:
判斷做何種操作
handler = HANDLER_MAPPING[base.params.task](base.params, base.user, base.logger, base.trace_id, base.log_file)# 這里的 HANDLER_MAPPING 主要包括4種操作。具體如下:
HANDLER_MAPPING = {
"start": StreamingStartHandler, # 這塊應(yīng)該是對(duì)應(yīng)上圖中的 moduleName 中的值
"stop": StreamingStopHandler,
"switchover": StreamingSwitchoverHandler,
"failover": StreamingFailoverHandler,
"query": StreamingQueryHandler
}
# 此處的 base.params.task 值為 start ,映射到類 StreamingStartHandler ,該類在文件 streaming_diaster_recovery_start.py 中
創(chuàng)建鎖定文件
handler.handle_lock_file(handler.trace_id, 'create') # 該方法在streaming_base.py中定義由于容災(zāi)搭建過程涉及到數(shù)據(jù)同步耗時(shí)較長(zhǎng),這里應(yīng)是為避免多次重復(fù)操作。
# 會(huì)生成一個(gè)文件:'/app/opengauss/tmp/streaming_lock_cd7eef1a2c1f11ee92b208002716c96f'
判斷是否有其他gs_sdr操作
if base.params.task in StreamingConstants.TASK_EXIST_CHECK:handler.check_streaming_process_is_running() # 有的話,就終止本次操作。
# 'source /home/omm/.bashrc && pssh -t 10 -H pghost2 -H pghost3 "ls /app/opengauss/tmp/streaming_lock_*"' 主要使用該命令
執(zhí)行操作
進(jìn)度記錄相關(guān)操作
handler.run()self.logger.log("Start create streaming disaster relationship.")
# 創(chuàng)建進(jìn)度記錄文件夾:/app/opengauss/tmp/streaming_cabin(所有節(jié)點(diǎn)均創(chuàng)建)
# 進(jìn)度記錄文件:'.streaming_switchover_primary.step'
## 所有的進(jìn)度記錄文件名字如下:
STREAMING_STEP_FILES = {
"start_primary": ".streaming_start_primary.step",
"start_standby": ".streaming_start_standby.step",
"stop": ".streaming_stop.step",
"switchover_primary": ".streaming_switchover_primary.step",
"switchover_standby": ".streaming_switchover_standby.step",
"failover": ".streaming_failover.step",
"query": ".streaming_query.step",
}
檢查集群狀態(tài)
# 檢查集群狀態(tài)'source /home/omm/.bashrc ; gs_om -t status --all > /app/opengauss/tmp/streaming_cabin/cluster_state_tmp'
判斷執(zhí)行節(jié)點(diǎn)是否為主節(jié)點(diǎn)
操作需要在主節(jié)點(diǎn)上執(zhí)行。
生成 key_name.key.cipher & key_name.key.rand 文件
export LD_LIBRARY_PATH=/app/opengauss/tool/script/gspylib/clib && source /home/omm/.bashrc && gs_guc generate -S default -o hadr -D '/app/opengauss/app/2.0.1_46134f73/bin' && /bin/chmod 600 /app/opengauss/app/2.0.1_46134f73/bin/hadr.key.cipher && /bin/chmod 600 /app/opengauss/app/2.0.1_46134f73/bin/hadr.key.rand# 隨后會(huì)將生成的文件分發(fā)到集群中其他節(jié)點(diǎn)上。
保存hadr信息到數(shù)據(jù)庫(kù)
ALTER GLOBAL CONFIGURATION with(hadr_user_info ='O1hnmUERtm2hfiXGjKjgaCfKq89IgdSzUqCoMGw/yzdaYki1LYTfhHlILmz10IvDTX9fqGNZrcmdX5NmkK+6bw==');檢查是否已經(jīng)有首備節(jié)點(diǎn)
判斷是否已經(jīng)是容災(zāi)環(huán)境。
檢查是否有cm
容災(zāi)環(huán)境必須要有cm組件。
檢查是否在升級(jí)中
# 判斷/app/opengauss/tmp/binary_upgrade是否存在寫進(jìn)度文件
$ more /app/opengauss/tmp/streaming_cabin/.streaming_start_primary.step2_check_cluster_step
common_step_for_streaming_start
# 生成容災(zāi)關(guān)系json文件 并分發(fā)到集群中的其它節(jié)點(diǎn)上。more /app/opengauss/tmp/streaming_cabin/cluster_conf_record
{"remoteClusterConf": {"port": 26500, "shards": [[{"ip": "192.168.56.50", "dataIp": "192.168.56.50"}, {"ip": "1
92.168.56.60", "dataIp": "192.168.56.60"}]]}, "localClusterConf": {"port": 26000, "shards": [[{"ip": "192.168.5
6.20", "dataIp": "192.168.56.20"}, {"ip": "192.168.56.30", "dataIp": "192.168.56.30"}]]}}
修改pg_hba配置
# 拷貝/home/omm/single.xml為/app/opengauss/tmp/streaming_cabin/streaming_config.xmlsource /home/omm/.bashrc; python3 '/app/opengauss/tool/script/local/ConfigHba.py' -U omm -X '/app/opengauss/tmp/streaming_cabin/streaming_config.xml' --try-reload
# 會(huì)在pg_hba.conf文件中加入:
host all omm 192.168.56.50/32 trust
host all omm 192.168.56.60/32 trust
host replication all 192.168.0.0/16 sha256
復(fù)制參數(shù)replconninfo相關(guān)設(shè)置
'source /home/omm/.bashrc; pssh -H pghost3 \'source /home/omm/.bashrc; gs_guc check -Z datanode -D /app/ogdata/data/dn1 -c "replconninfo1"\'''source /home/omm/.bashrc; pssh -H pghost3 \'source /home/omm/.bashrc; gs_guc check -Z datanode -D /app/ogdata/data/dn1 -c "replconninfo2"\''
'source /home/omm/.bashrc; pssh -H pghost3 "source /home/omm/.bashrc ; gs_guc reload -Z datanode -D /app/ogdata/data/dn1 -c \\"replconninfo1 = \'localhost=192.168.56.30 localport=26001 localheartbeatport=26005 localservice=26004 remotehost=192.168.56.20 remoteport=26001 remoteheartbeatport=26005 remoteservice=26004 iscascade=true iscrossregion=false\'\\""'
等待首備連接
Waiting for the main standby connection.
這里需要在備集群執(zhí)行下面的命令:
gs_sdr -t start -m disaster_standby -U dr_user -W oracle_4U -X /home/omm/single.xml --time-out=86400 # 此處為方便,直接在終端上執(zhí)行該命令,沒有進(jìn)行調(diào)試。
將集群2啟動(dòng)為備集群
gs_sdr -t start -m disaster_standby -U dr_user -W oracle_4U -X /home/omm/single.xml --time-out=86400vscode調(diào)試配置
{
"version": "0.2.0",
"configurations": [
{
"name": "gs_sdr",
"type": "python",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal",
"justMyCode": true,
"args": ["-t","start","-m","disaster_standby","-X","/home/omm/single.xml","-U","dr_user","-W","oracle_4U","--time-out=86400"]
}
]
}
執(zhí)行的類: streaming_diaster_recovery_start
代碼閱讀
Start build key files from remote cluster
備集群會(huì)進(jìn)行build,速度比較慢(與網(wǎng)絡(luò)環(huán)境和數(shù)據(jù)庫(kù)大小關(guān)系較大)。
source /home/omm/.bashrc; /app/opengauss/app/2.0.1/bin/gs_ctl build -D /app/ogdata/data/dn1 -M standby -b copy_secure_files -U dr_user -P *** -C "localhost=192.168.56.50 localport=26001 remotehost=192.168.56.20 remoteport=26501"source /home/omm/.bashrc; /app/opengauss/app/2.0.1/bin/gs_ctl build -D /app/ogdata/data/dn1 -M standby -b copy_secure_files -U dr_user -P *** -C "localhost=192.168.56.50 localport=26001 remotehost=192.168.56.30 remoteport=26501"
echo *** /home/omm/.bashrc; /app/opengauss/app/2.0.1/bin/gs_ctl build -D /app/ogdata/data/dn1 -M standby -b copy_secure_files -U dr_user -P *** -C 'localhost=192.168.56.60 localport=26001 remotehost=192.168.56.20 remoteport=26501'" | pssh -s -H pghost6'
copy file from data dir to streaming dir
# 第1個(gè)節(jié)點(diǎn)echo "if [ -d \'/app/ogdata/data/dn1/gs_secure_files\' ];then source /home/omm/.bashrc && pscp --trace-id 9f2c898e2c5a11ee850c080027fd3332 -H pghost5 \'/app/ogdata/data/dn1/gs_secure_files\' \'/app/opengauss/tmp/streaming_cabin\' && rm -rf \'/app/ogdata/data/dn1/gs_secure_files\';fi" | pssh -s -H pghost5
# 第2個(gè)節(jié)點(diǎn)
echo "if [ -d \'/app/ogdata/data/dn1/gs_secure_files\' ];then source /home/omm/.bashrc && pscp --trace-id 9f2c898e2c5a11ee850c080027fd3332 -H pghost5 \'/app/ogdata/data/dn1/gs_secure_files\' \'/app/opengauss/tmp/streaming_cabin\' && rm -rf \'/app/ogdata/data/dn1/gs_secure_files\';fi" | pssh -s -H pghost6
check cluster user consistency
主要檢查版本和版本提交號(hào)是否一致。
檢查安裝用戶是否一致
設(shè)置集群運(yùn)行模式stream_cluster_run_mode
source /home/omm/.bashrc && gs_guc set -Z datanode -N all -I all -c "stream_cluster_run_mode = \'cluster_standby\'"source /home/omm/.bashrc && gs_guc set -Z coordinator -N all -I all -c "stream_cluster_run_mode = \'cluster_standby\'"
停止備集群
'/app/opengauss/app/2.0.1_46134f73/bin/cluster_static_config'再次build集群
source /home/omm/.bashrc; /app/opengauss/app/2.0.1_46134f73/bin/gs_ctl start -D /app/ogdata/data/dn1 -M hadr_main_standbyecho *** /home/omm/.bashrc; /app/opengauss/app/2.0.1_46134f73/bin/gs_ctl build -D /app/ogdata/data/dn1 -M cascade_standby -b standby_full -r 7200 -t 1209600" | pssh -s -t 1209610 -H pghost6
啟動(dòng)集群
source /home/omm/.bashrc ; cm_ctl start -t 604800 # 此時(shí)的集群已經(jīng)是首備和級(jí)聯(lián)備狀態(tài)了。查詢?nèi)轂?zāi)狀態(tài)
gs_sdr -t query主集群
$ gs_sdr -t query--------------------------------------------------------------------------------
Streaming disaster recovery query 9f658f3a2d0511eebbb208002716c96f
--------------------------------------------------------------------------------
Start streaming disaster query.
Start check archive.
Start check recovery.
Start check RPO & RTO.
Successfully executed streaming disaster recovery query, result:
{'hadr_cluster_stat': 'archive', 'hadr_failover_stat': '', 'hadr_switchover_stat': '', 'RPO': '0', 'RTO': '0'}
備集群
$ gs_sdr -t query--------------------------------------------------------------------------------
Streaming disaster recovery query ad8afd5c2d0511ee88cf080027fd3332
--------------------------------------------------------------------------------
Start streaming disaster query.
Start check archive.
Start check recovery.
Start check RPO & RTO.
Successfully executed streaming disaster recovery query, result:
{'hadr_cluster_stat': 'restore', 'hadr_failover_stat': '', 'hadr_switchover_stat': '', 'RPO': '', 'RTO': ''}