• 企业400电话
  • 微网小程序
  • AI电话机器人
  • 电商代运营
  • 全 部 栏 目

    企业400电话 网络优化推广 AI电话机器人 呼叫中心 网站建设 商标✡知产 微网小程序 电商运营 彩铃•短信 增值拓展业务
    PostgreSQL+Pgpool实现HA主备切换的操作

    PostgreSQL流复制实现HA主备切换

    环境说明和主机规划

    操作系统 主机名 主机 角色 端口
    CentOS 7 master 10.0.0.11 PG-Master 54321
    CentOS 7 slave 10.0.0.12 PG-Slave 54321
    CentOS 7 pool 10.0.0.13 pgpool 54321

    基础环境配置(所有主机操作)

    配置HOSTS

    echo -e "10.0.0.11 master\n10.0.0.12 slave\n10.0.0.13 pool" >> /etc/hosts # 执行一次即可

    配置统一的时间(若已配置,请忽略)

    yum install -y ntpdate  ntpdate ntp1.aliyun.com
    echo -e "# sync time from ntp1.aliyun.com\n5 * * * * /usr/sbin/ntpdate ntp1.aliyun.com > /dev/null 2>1
    " >> /var/spool/cron/root # 写入定时任务,执行一次即可

    创建postgres用户

    useradd postgres echo "your_password" | passwd --stdin postgres

    配置免密钥登陆

    su - postgres
    ssh-keygen -t rsa -f /home/postgres/.ssh/id_rsa -P "" 
    cd ~/.ssh/
    ssh-copy-id postgres@master # 三台主机执行
    scp authorized_keys postgres@slave:~/.ssh # 只在master主机执行
    scp authorized_keys postgres@pool:~/.ssh # 只在master主机执行

    安装Postgresql数据库(PG9.6)

    yum install -y https://download.postgresql.org/pub/repos/yum/9.6/redhat/rhel-7-x86_64/pgdg-centos96-9.6-3.noarch.rpm

    yum install -y postgresql96-server postgresql96-contrib postgresql96 postgresql96-libs

    创建统一的目录结构

    mkdir /data1/pg_{data,bin,logs} -p

    chown -R postgres.postgres /data1/

    修改系统变量

    vi /etc/profile #增加以下内容
    export PGHOME=/usr/pgsql-9.6/
    export PGDATA=/data1/pg_data
    export PGPORT=54321
    export PATH=$PATH:$PGHOME/bin
    # 生效
    source /etc/profile

    PostgreSQL流复制结构(master和slave主机操作)

    master主机操作

    初始化系统

    /usr/pgsql-9.6/bin/postgresql96-setup initdb

    vi /usr/lib/systemd/system/postgresql-9.6.service

    修改postgresql-9.6.service

    内容如下:

    # Include the default config:
    .include /usr/lib/systemd/system/postgresql-9.6.service
    
    [Service]
    Environment=PGDATA=/data1/pg_data
    

    重启PG服务

    systemctl daemon-reload
    su - postgres -c '/usr/pgsql-9.6/bin/initdb -D /data1/pg_data'
    systemctl restart postgresql-9.6
    systemctl enable postgresql-9.6.service

    修改系统配置(以下用postgres用户操作)

    cp /data1/pg_data/pg_hba.conf{,.bak} 
    cat >/data1/pg_data/pg_hba.confEOF
    local all    all            trust
    host all    all      10.0.0.11/32   trust
    host all    all      10.0.0.12/32   trust
    host all    all      0.0.0.0/0    md5
    host all    all      ::1/128     trust
    host replication  stream_replication  0.0.0.0/0    md5
    EOF
    #host replication  stream_replication  0.0.0.0/0    md5 为流复制用户

    64G

    cp /data1/pg_data/postgresql.conf{,.bak}
    cat >/data1/pg_data/postgresql.confEOF
    listen_addresses = '*'
    port = 54321
    max_connections = 256
    shared_buffers = 16GB
    effective_cache_size = 48GB
    work_mem = 64MB
    maintenance_work_mem = 2GB
    min_wal_size = 2GB
    max_wal_size = 4GB
    checkpoint_completion_target = 0.9
    wal_buffers = 16MB
    default_statistics_target = 100
    wal_level = hot_standby
    wal_log_hints = on
    max_wal_senders = 1
    hot_standby = on
    logging_collector = on
    log_directory = 'pg_log'
    EOF
    #操作完记得重启 pg_ctl restart

    128G

    listen_addresses = '*'
    port = 54321
    max_connections = 256
    shared_buffers = 32GB
    effective_cache_size = 96GB
    work_mem = 128MB
    maintenance_work_mem = 2GB
    min_wal_size = 2GB
    max_wal_size = 4GB
    checkpoint_completion_target = 0.9
    wal_buffers = 16MB
    default_statistics_target = 100
    wal_level = hot_standby
    wal_log_hints = on
    max_wal_senders = 1
    hot_standby = on
    logging_collector = on
    log_directory = 'pg_log'

    在主库中创建流复制用户(stream_replication)和PGPool用户(srcheck)

    CREATE USER stream_replication replication LOGIN CONNECTION LIMIT 5 ENCRYPTED PASSWORD 'your_password';

    CREATE USER srcheck replication LOGIN CONNECTION LIMIT 5 ENCRYPTED PASSWORD 'your_password';

    修改主库pg_hba.conf文件(已操作见cat >/data1/pg_data/pg_hba.confEOF)

    host replication stream_replication 0.0.0.0/0 md5

    slave主机操作

    初始化系统

    /usr/pgsql-9.6/bin/postgresql96-setup initdb

    vi /usr/lib/systemd/system/postgresql-9.6.service

    修改postgresql-9.6.service

    内容如下:

    # Include the default config:
    .include /usr/lib/systemd/system/postgresql-9.6.service
    
    [Service]
    Environment=PGDATA=/data1/pg_data
    

    重启PG服务

    systemctl daemon-reload

    基础备份复制到备库服务器

    rm -rf /data1/pg_data # 如果没有重要数据可操作,主要为同步主库路径

    su - postgres -c 'pg_basebackup -D $PGDATA --format=p -h master -p 54321 -U stream_replication -W'

    修改备库配置信息

    cp $PGHOME/share/recovery.conf.sample $PGDATA/recovery.conf

    vi $PGDATA/recovery.conf

    增加以下内容

    standby_mode='on'
    primary_conninfo = 'host=master port=54321 user=stream_replication password=your_password'
    restore_command = ''
    recovery_target_timeline = 'latest'
    # 重启PG服务
    systemctl restart postgresql-9.6
    systemctl enable postgresql-9.6.service

    验证

    主节点执行

    create table test (id int4, create_time timestamp(0) without time zone);
    insert into test values (1, now());
    select * from test;
    
    

    备节点执行

    select * from test;

    其他查询

    进入测试数据库test,主库上执行如下命令返回f,备库上返回t。 select pg_is_in_recovery();

    执行如下命令查看快照,它返回主库记录点、备库记录点;主库每增加一条写入,记录点的值就会加1。

    select txid_current_snapshot();

    执行如下命令可以查看主备同步状态。

    select * from pg_stat_replication;

    字段state显示的同步状态有:startup(连接中)、catchup(同步中)、streaming(同步);字段sync_state显示的模式有:async(异步)、sync(同步)、potential(虽然现在是异步模式,但是有可能升级到同步模式)。

    主备切换

    假设主库崩溃了,备库如何从只读状态切换为读写状态呢?只要把备库的postgresql.conf中hot_standby修改为off,并且删除recovery.conf,然后重启库就可以提供服务了。

    PGPool2(pool主机操作)

    安装PGPool2

    yum install -y http://www.pgpool.net/yum/rpms/3.6/redhat/rhel-7-x86_64/pgpool-II-release-3.6-1.noarch.rpm
    yum -y install pgpool-II-pg96 pgpool-II-pg96-debuginfo pgpool-II-pg96-devel pgpool-II-pg96-extensions
    systemctl enable pgpool.service #开启自动启动

    添加Pgpool-II运行用户

    useradd postgres # 环境准备时已操作
    chown -R postgres.postgres /etc/pgpool-II
    chown -R postgres.postgres /var/run/pgpool/

    配置pool_hba.conf

    cp /etc/pgpool-II/pool_hba.conf{,.bak}

    vi /etc/pgpool-II/pool_hba.conf

    增加内容

    host all all 0.0.0.0/0 md5

    配置pcp.conf

    主节点登陆后执行:

    postgres=# select rolname,rolpassword from pg_authid;
      rolname  |    rolpassword    
    --------------------+-------------------------------------
     pg_signal_backend | 
     srcheck   | md5662c10f61b27a9ab38ce69157186b25f
     postgres   | md5d3612d57ee8d4c147cf27b11e3a0974d
     stream_replication | md59279ef6b904bc483e4f85e6d44cfc0ed
    (4 rows)

    vi /etc/pgpool-II/pool_passwd

    增加SQL执行结果的内容,形式为$rolname:$rolpassword例如:

    srcheck:md5662c10f61b27a9ab38ce69157186b25f

    或者:

    pg_md5 -u postgres your_password

    vi /etc/pgpool-II/pcp.conf ## 加入 postgres:上一命令的输出

    配置pgpool.conf

    cp /etc/pgpool-II/pgpool.conf{,.bak}

    vi /etc/pgpool-II/pgpool.conf

    内容如下:

    # CONNECTIONS
    
    listen_addresses = '*'
    port = 54321
    socket_dir = '/var/run/pgpool'
    pcp_listen_addresses = '*'
    pcp_port = 9898
    pcp_socket_dir = '/var/run/pgpool'
    
    # - Backend Connection Settings -
    
    backend_hostname0 = 'master'
    backend_port0 = 54321
    backend_weight0 = 1
    backend_data_directory0 = '/data1/pg_data'
    backend_flag0 = 'ALLOW_TO_FAILOVER'
    
    backend_hostname1 = 'slave'
    backend_port1 = 54321
    backend_weight1 = 1
    backend_data_directory1 = '/data1/pg_data'
    backend_flag1 = 'ALLOW_TO_FAILOVER'
    
    # - Authentication -
    
    enable_pool_hba = on
    pool_passwd = 'pool_passwd'
    
    # FILE LOCATIONS
    
    pid_file_name = '/var/run/pgpool/pgpool.pid'
    logdir = '/data1/pg_logs'
    
    replication_mode = off
    load_balance_mode = on
    master_slave_mode = on
    master_slave_sub_mode = 'stream'
    
    sr_check_period = 5
    sr_check_user = 'srcheck'
    sr_check_password = '123456'
    sr_check_database = 'postgres'
    
    # HEALTH CHECK 健康检查
    
    health_check_period = 10
    health_check_timeout = 20
    health_check_user = 'srcheck'
    health_check_password = '123456'
    health_check_database = 'postgres'
    
    # FAILOVER AND FAILBACK
    
    failover_command = '/data1/pg_bin/failover_stream.sh %H'
    
    

    failover_stream.sh脚本

    vim /data1/pg_bin/failover_stream.sh
    chmod 777 /data1/pg_bin/failover_stream.sh
    chmod u+s /sbin/ifconfig 
    chmod u+s /usr/sbin
    pgpool -n -d -D > /data1/pg_logs/pgpool.log 2>1  ## 启动
    pgpool -m fast stop ## 关闭

    failover_stream.sh内容:

    #! /bin/sh 
    # Failover command for streaming replication. 
    # Arguments: $1: new master hostname. 
    
    new_master=$1 
    trigger_command="$PGHOME/bin/pg_ctl promote -D $PGDATA" 
    
    # Prompte standby database. 
    /usr/bin/ssh -T $new_master $trigger_command 
    
    exit 0;
    

    登陆设置

    当执行pgpool -n -d -D > /data1/pg_logs/pgpool.log 2>1 后可查看集群状态:

    [postgres@pool pgpool-II]$ psql -p 54321 -h 10.0.0.13 -U srcheck -d postgres
    
    postgres=# show pool_nodes;
     node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay 
    ---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------
     0  | master | 54321 | up  | 0.500000 | primary | 0   | false    | 0
     1  | slave | 54321 | up  | 0.500000 | standby | 0   | true    | 0
    (2 rows)
    

    如果未发现集群状态,请在master和slave主机分别执行以下操作:

    [postgres@pool ~]$ pcp_attach_node -d -U postgres -h pool -p 9898 -n 0
    [postgres@pool ~]$ pcp_attach_node -d -U postgres -h pool -p 9898 -n 1
    #详情查询命令pcp_attach_node

    HA切换

    模拟master主机宕机

    Master端:

    [postgres@master ~]$ pg_ctl stop
    waiting for server to shut down.... done
    server stopped

    当前集群状态

    [postgres@pool ~]$ psql -p 54321 -h 10.0.0.13 -U srcheck -d postgres
    psql (9.6.1)
    Type "help" for help.
    
    postgres=# show pool_nodes;
     node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay 
    ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
     0  | master | 5432 | down| 0.500000 | standby | 0    | false | 0
     1  | slave  | 5432 | up  | 0.500000 | primary | 0    | true | 0
    (2 rows)
    

    发现master已经是standby了,且down机了

    修改master,启动

    当master主机宕机后,此时slave主机PG数据库成为主库,修改master成为slave的从库即可

    [postgres@master ~]$ vim recovery.conf
    standby_mode='on'
    primary_conninfo = 'host=slave port=54321 user=stream_replication password=your_password'
    restore_command = ''
    recovery_target_timeline = 'latest'

    同步时间线

    #如果报时间线冲突落后,先停掉pg服务,然后执行同步时间线,否知直接看状态
    [postgres@master ~]$ pg_rewind --target-pgdata=/data1/pg_data --source-server='host=slave port=54321 user=postgres dbname=postgres'
    servers diverged at WAL position 0/5000098 on timeline 1
    rewinding from last common checkpoint at 0/5000028 on timeline 1
    Done!
    # 重新启动数据库
    [postgres@master ~]$ pg_ctl start

    再次查看当前状态

    [postgres@pool ~]$ psql -p 54321 -h 10.0.0.13 -U srcheck -d postgres
    postgres=# show pool_nodes;
     node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay 
    ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
     0  | master | 5432 | down| 0.500000 | standby | 0    | false | 0
     1  | slave  | 5432 | up  | 0.500000 | primary | 0    | true | 0
    (2 rows)
    
    #注意虽然master已经启动了,但是还是down,需要手动将master节点添加进pgpool,master的node_id是0,所以-n 0
    [postgres@pool ~]$ pcp_attach_node -d -U postgres -h pool -p 54321 -n 0
    #提示输入密码,输入pcp管理密码
    #查看当前状态
    [postgres@pool ~]$ psql -p 54321 -h 10.0.0.13 -U srcheck -d postgres
    postgres=# show pool_nodes;
     node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay 
    ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
     0  | master | 5432 | up | 0.500000 | standby | 0    | false | 0
     1  | slave  | 5432 | up  | 0.500000 | primary | 0    | true | 0
    (2 rows)
    

    现在两个节点都是up了。

    主从两节点pgpool健康检查脚本(pgpool_check.sh)

    说明:此脚本是基于PGpool只安装到master和slave两个主机上的情况下使用,在master主机有了pgpool进程后,可在slave主机执行sh pgpool_check.sh 即可

    #! /bin/bash
    # Check Master host pgpool-process
    
    while true
    do
     pgcount=$(nmap 10.0.0.11|egrep '9898|9999'|wc -l)
    
     if [ $pgcount -eq 2 ] ; then
      echo 'Master host pgpool is GOOD!!!' > /dev/null 2>1
     else
      echo -e "Master host pgpool is \033[31m BAD!!! \033[0m"
      echo -e "Master host pgpool is \033[31m BAD!!! \033[0m"
      echo -e "Master host pgpool is \033[31m BAD!!! \033[0m"
      echo -e "SYSTEM WILL DO THE SHELL : \033[34m su - postgres -c 'pgpool -n -d -D > /data1/pg_logs/pgpool.log 2>1 ' \033[0m"
      su - postgres -c 'pgpool -n -d -D > /data1/pg_logs/pgpool.log 2>1 '
      pgport=$(netstat -lntup|egrep '9898|9999'|wc -l)
      [ $pgport -gt 0 ]  echo -e "Slave host pgpool is \033[32m RUNNING!!! \033[0m"
      exit 0
     fi
    done
    

    以上为个人经验,希望能给大家一个参考,也希望大家多多支持脚本之家。如有错误或未考虑完全的地方,望不吝赐教。

    您可能感兴趣的文章:
    • PostgreSQL 主备数据宕机恢复测试方案
    • PostgreSQL 数据同步到ES 搭建操作
    • postgresql 如何查看pg_wal目录下xlog文件总大小
    • postgresql之使用lsn 获取 wal文件名的实例
    • 修改postgresql存储目录的操作方式
    • postgresql运维之远程迁移操作
    • postgresql 12版本搭建及主备部署操作
    上一篇:PostgreSQL时间线(timeline)和History File的用法
    下一篇:Postgresql创建新增、删除与修改触发器的方法
  • 相关文章
  • 

    © 2016-2020 巨人网络通讯 版权所有

    《增值电信业务经营许可证》 苏ICP备15040257号-8

    PostgreSQL+Pgpool实现HA主备切换的操作 PostgreSQL+Pgpool,实现,主备,