• 企业400电话
  • 微网小程序
  • AI电话机器人
  • 电商代运营
  • 全 部 栏 目

    企业400电话 网络优化推广 AI电话机器人 呼叫中心 网站建设 商标✡知产 微网小程序 电商运营 彩铃•短信 增值拓展业务
    MySQL手动注册binlog文件造成主从异常的原因

    一、问题来源

    有一个朋友@水米田 问我,基于POSITION的主从。他做了如下的操作

    将备份的一些binlog文件加入到了目录中
    修改index文件,加入了这些binlog文件
    flush binary logs
    然后整个主从环境大量延迟。

    二、朋友的测试

    下面是另外一个朋友@徐晨亮的测试:

    master上:
    (root:db1@xucl:10:30:22)[(none)]> show binary logs;
    +---------------------+-----------+
    | Log_name      | File_size |
    +---------------------+-----------+
    | mysql-binlog.000031 |   1019 |
    | mysql-binlog.000032 |    424 |
    | mysql-binlog.000033 |    244 |
    | mysql-binlog.000034 |   2332 |
    | mysql-binlog.000035 |   2134 |
    | mysql-binlog.000036 |  845915 |
    | mysql-binlog.000037 |   11735 |
    | mysql-binlog.000038 |    284 |
    | mysql-binlog.000039 |    284 |
    | mysql-binlog.000040 |    284 |
    | mysql-binlog.000041 |    284 |
    | mysql-binlog.000042 |    234 |
    +---------------------+-----------+
    12 rows in set (0.00 sec)
    (root:db1@xucl:10:30:34)[(none)]> purge binary logs to 'mysql-binlog.000039';
    Query OK, 0 rows affected (0.00 sec)
    (root:db1@xucl:10:30:49)[(none)]> show binary logs;
    +---------------------+-----------+
    | Log_name      | File_size |
    +---------------------+-----------+
    | mysql-binlog.000039 |    284 |
    | mysql-binlog.000040 |    284 |
    | mysql-binlog.000041 |    284 |
    | mysql-binlog.000042 |    234 |
    +---------------------+-----------+
    4 rows in set (0.00 sec)
    执行插入数据
    (root:db1@xucl:10:31:23)[xucl]> insert into t values(9,9);
    将备份的binlog拷贝回原先的目录并修改index文件进行注册
    [root@izbp12nspj47ypto9t6vyez logs]# ll
    总用量 884
    -rw-r----- 1 mysql mysql  1019 5月 20 22:03 mysql-binlog.000031
    -rw-r----- 1 mysql mysql  424 5月 20 22:03 mysql-binlog.000032
    -rw-r----- 1 mysql mysql  244 5月 20 22:03 mysql-binlog.000033
    -rw-r----- 1 mysql mysql  2332 5月 20 22:03 mysql-binlog.000034
    -rw-r----- 1 mysql mysql  2134 5月 20 22:03 mysql-binlog.000035
    -rw-r----- 1 mysql mysql 845915 5月 20 22:03 mysql-binlog.000036
    -rw-r----- 1 mysql mysql 11735 5月 20 22:05 mysql-binlog.000037
    -rw-r----- 1 mysql mysql  284 5月 20 22:06 mysql-binlog.000038
    -rw-r----- 1 mysql mysql  284 5月 21 10:28 mysql-binlog.000039
    -rw-r----- 1 mysql mysql  284 5月 21 10:28 mysql-binlog.000040
    -rw-r----- 1 mysql mysql  284 5月 21 10:28 mysql-binlog.000041
    -rw-r----- 1 mysql mysql  491 5月 21 10:31 mysql-binlog.000042
    -rw-r----- 1 mysql mysql  204 5月 21 10:30 mysql-binlog.index
    主库flush binary logs
    (root:db1@xucl:10:32:51)[(none)]> flush binary logs;
    Query OK, 0 rows affected (0.01 sec)
    (root:db1@xucl:10:32:57)[(none)]> show binary logs;
    +---------------------+-----------+
    | Log_name      | File_size |
    +---------------------+-----------+
    | mysql-binlog.000031 |   1019 |
    | mysql-binlog.000032 |    424 |
    | mysql-binlog.000033 |    244 |
    | mysql-binlog.000034 |   2332 |
    | mysql-binlog.000035 |   2134 |
    | mysql-binlog.000036 |  845915 |
    | mysql-binlog.000037 |   11735 |
    | mysql-binlog.000038 |    284 |
    | mysql-binlog.000039 |    284 |
    | mysql-binlog.000040 |    284 |
    | mysql-binlog.000041 |    284 |
    | mysql-binlog.000042 |    541 |
    | mysql-binlog.000043 |    234 |
    +---------------------+-----------+
    13 rows in set (0.00 sec)
    此时,slave报错如下:
    (root:db1@xucl:10:31:05)[(none)]> show slave status\G
    *************************** 1. row ***************************
            Slave_IO_State:
             Master_Host: 127.0.0.1
             Master_User: repl
             Master_Port: 3306
            Connect_Retry: 60
           Master_Log_File: mysql-binlog.000035
         Read_Master_Log_Pos: 194
            Relay_Log_File: izbp12nspj47ypto9t6vyez-relay-bin.000011
            Relay_Log_Pos: 373
        Relay_Master_Log_File: mysql-binlog.000035
           Slave_IO_Running: No
          Slave_SQL_Running: Yes
           Replicate_Do_DB:
         Replicate_Ignore_DB:
          Replicate_Do_Table:
        Replicate_Ignore_Table:
       Replicate_Wild_Do_Table:
     Replicate_Wild_Ignore_Table:
              Last_Errno: 0
              Last_Error:
             Skip_Counter: 0
         Exec_Master_Log_Pos: 194
           Relay_Log_Space: 648
           Until_Condition: None
            Until_Log_File:
            Until_Log_Pos: 0
          Master_SSL_Allowed: No
          Master_SSL_CA_File:
          Master_SSL_CA_Path:
           Master_SSL_Cert:
          Master_SSL_Cipher:
            Master_SSL_Key:
        Seconds_Behind_Master: NULL
    Master_SSL_Verify_Server_Cert: No
            Last_IO_Errno: 1236
            Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Cannot replicate GTID-transaction when @@GLOBAL.GTID_MODE = OFF, at file /storage/single/mysql3306/logs/mysql-binlog.000035, position 194.; the first event 'mysql-binlog.000039' at 234, the last event read from '/storage/single/mysql3306/logs/mysql-binlog.000035' at 259, the last byte read from '/storage/single/mysql33'
            Last_SQL_Errno: 0
            Last_SQL_Error:
     Replicate_Ignore_Server_Ids:
           Master_Server_Id: 3306
             Master_UUID: e8bdf01a-c79b-11e8-82b3-00163e088352
           Master_Info_File: mysql.slave_master_info
              SQL_Delay: 0
         SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
          Master_Retry_Count: 86400
             Master_Bind:
       Last_IO_Error_Timestamp: 190521 10:32:57
       Last_SQL_Error_Timestamp:
            Master_SSL_Crl:
          Master_SSL_Crlpath:
          Retrieved_Gtid_Set:
          Executed_Gtid_Set: 4c423515-6661-11e9-b767-00163e088352:1-7,
    e8bdf01a-c79b-11e8-82b3-00163e088352:1-57192
            Auto_Position: 0
         Replicate_Rewrite_DB:
             Channel_Name:
          Master_TLS_Version:
    1 row in set (0.00 sec)
    从slave上的报错来看,在主库flush binary logs后,从库又读取注册的binlog并且又apply了

    三、现象说明

    从整个测试来看,MySQL视乎将手动注册的文件进行了传输和应用。报错是因为这个库以前是GITD_MODE=ON的,但是测试的时候已经关闭了GTID_MODE,改为了POSITION的模式,这个报错是因为DUMP线程会进行检测:

    这个图来自我新写的一个系列(暂时还没发布,大概年底才能写好)。不管怎么说,DUMP线程已经在开始传输老的binlog文件了。那么原因是什么呢?下面我们将进行解释。

    四、flush binary logs对binlog的操作

    flush binary logs 将包含如下操作:

    具体可以参考函数MYSQL_BIN_LOG::new_file_impl。其中要说明一下获取新的binlog文件名字是通过函数find_uniq_filename实现的,其中包含如下代码:

     file_info= dir_info->dir_entry;
     for (i= dir_info->number_off_files ; i-- ; file_info++)
     {
      if (strncmp(file_info->name, start, length) == 0 
      is_number(file_info->name+length, number,0))
      {
       set_if_bigger(max_found, number);
      }
     }
    ...
     *next= (need_next || max_found == 0) ? max_found + 1 : max_found;

    大概意思就是扫描index file文件里面的binlog文件,获取其序号最高的一个,然后加1。栈帧如下:

    #0 find_uniq_filename (name=0x7fffec5ec6d0 "/mysqldata/mysql3340/log/binlog", next=0x7fffec5ec678, need_next=true)
      at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:3679
    #1 0x000000000187d208 in generate_new_log_name (new_name=0x7fffec5ec6d0 "/mysqldata/mysql3340/log/binlog", new_ext=0x0, 
      log_name=0x7ffedc011520 "/mysqldata/mysql3340/log/binlog", is_binlog=true) at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:3767
    #2 0x0000000001884fdb in MYSQL_BIN_LOG::new_file_impl (this=0x2e83640, need_lock_log=false, extra_description_event=0x0)
      at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7175
    #3 0x0000000001884cbb in MYSQL_BIN_LOG::new_file_without_locking (this=0x2e83640, extra_description_event=0x0)
      at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7099
    #4 0x0000000001886b6b in MYSQL_BIN_LOG::rotate (this=0x2e83640, force_rotate=true, check_purge=0x7fffec5ecbfb)
      at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7775
    #5 0x0000000001886d53 in MYSQL_BIN_LOG::rotate_and_purge (this=0x2e83640, thd=0x7ffedc000b90, force_rotate=true)
      at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:7846

    因此即便我们手动修改了index file,flush binary logs却不会有问题,因为它实际扫描了index file文件。
    同时我们也看到flush binary logs重新加载了index file,这个时候手动修改的index file就生效了,使用show binary logs就能查看到你加入的文件了。

    五、主从问题的产生

    binlog切换后,DUMP线程也需要读取下一个binlog文件。我们来看看它是怎么确认读取哪一个文件的。

    在函数sender.run()中可以找到循环每个binlog文件的代码。下面一句是寻找下一个binlog文件:

    int error= mysql_bin_log.find_next_log(m_linfo, 0);

    mysql_bin_log.find_next_log 包含的代码:

     my_b_seek(index_file, linfo->index_file_offset);//进行偏移量偏移
    
     linfo->index_file_start_offset= linfo->index_file_offset;
     length=my_b_gets(index_file, fname, FN_REFLEN));//读取文件名字
    ...
      if(normalize_binlog_name(full_fname, fname, is_relay_log))
    ...
    
     linfo->index_file_offset= my_b_tell(index_file);//偏移量从新记录以备下一次使用

    我们能够看到DUMP线程并没有实际扫描整个index文件,而是通过一个index文件的偏移量进行读取。如果手动修改index文件那么偏移量就出现了错乱。因此DUMP发送的下一个文件将是不确定的。因此出现了发送手动注册的binlog文件给从库的现象,这种情况下可能出现下面情况:

    1. 如果是GTID_MODE 关闭,使用POSITION那么这种情况一定报错,比如重复的行。
    2. 如果是GTID_MODE 和AUTO_POSITION=1,那么DUMP线程会进行GTID的过滤不发送,因为Event不发送所以延迟会持续一段时间为0。
    3. 如果是GTID_MODE 和AUTO_POSITION=0,那么SQL线程应用GITD_EVENT的时候会进行过滤,延迟可能出现很大的情况。

    尽管GTID可能可以屏蔽这种问题,但是DUMP线程已经在考虑发送老的binlog文件了,这是不合适的。

    六、purge binary logs能够维护这个偏移量

    为什么purge binary logs不会出现问题呢,因为在purge binary logs的语句下,会维护这个偏移量如下:

     virtual void operator()(THD *thd)
     {
      LOG_INFO* linfo;
      mysql_mutex_lock(thd->LOCK_thd_data);
      if ((linfo= thd->current_linfo))//b binlog.cc:2829
      {
       /*
        Index file offset can be less that purge offset only if
        we just started reading the index file. In that case
        we have nothing to adjust.
       */
       if (linfo->index_file_offset  m_purge_offset)
        linfo->fatal = (linfo->index_file_offset != 0);
       else
        linfo->index_file_offset -= m_purge_offset;
      }
      mysql_mutex_unlock(thd->LOCK_thd_data);

    我们可以看到linfo->index_file_offset -= m_purge_offset;这样一个语句。下面是栈帧:

    #0 Adjust_offset::operator() (this=0x7fffec5ec720, thd=0x7ffedc000be0) at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:2831
    #1 0x0000000000eef320 in Do_THD::operator() (this=0x7fffec5ec6a0, thd=0x7ffedc000be0) at /mysqldata/percona-server-locks-detail-5.7.22/sql/mysqld_thd_manager.cc:46
    #2 0x0000000000eefa0f in std::for_eachTHD**, Do_THD> (__first=0x3003358, __last=0x3003368, __f=...)
      at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/bits/stl_algo.h:4200
    #3 0x0000000000eeefc0 in Global_THD_manager::do_for_all_thd (this=0x3003340, func=0x7fffec5ec720)
      at /mysqldata/percona-server-locks-detail-5.7.22/sql/mysqld_thd_manager.cc:273
    #4 0x000000000187ae0a in adjust_linfo_offsets (purge_offset=0) at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:2873
    #5 0x0000000001883239 in MYSQL_BIN_LOG::remove_logs_from_index (this=0x2e83640, log_info=0x7fffec5ec7d0, need_update_threads=true)
      at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:6352
    #6 0x0000000001883676 in MYSQL_BIN_LOG::purge_logs (this=0x2e83640, to_log=0x7fffec5eca80 "/mysqldata/mysql3340/log/binlog.000001", included=false, 
      need_lock_index=true, need_update_threads=true, decrease_log_space=0x0, auto_purge=false) at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:6469
    #7 0x000000000187b4b5 in purge_master_logs (thd=0x7ffee0000c00, to_log=0x7ffee0006600 "binlog.000001")
      at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:3127

    七、测试POSITION模式下的报错

    1、 环境

    mysql> show binary logs;
    +---------------+-----------+
    | Log_name   | File_size |
    +---------------+-----------+
    | binlog.000001 |    198 |
    | binlog.000002 |    154 |
    +---------------+-----------+
    2 rows in set (0.00 sec)
    
    mysql> show create table testcp \G;
    *************************** 1. row ***************************
        Table: testcp
    Create Table: CREATE TABLE `testcp` (
     `id` int(11) NOT NULL,
     PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8
    1 row in set (0.00 sec)
    
    ERROR: 
    No query specified

    2、执行一个语句

    主库:

    mysql> insert into testcp values(20);
    Query OK, 1 row affected (0.43 sec)
    
    mysql> select * from testcp;
    +----+
    | id |
    +----+
    | 10 |
    | 20 |
    +----+
    2 rows in set (0.01 sec)

    从库:

    mysql> show slave status \G;
    *************************** 1. row ***************************
            Slave_IO_State: Waiting for master to send event
             Master_Host: 192.168.99.41
             Master_User: repl
             Master_Port: 3340
            Connect_Retry: 60
           Master_Log_File: binlog.000002
         Read_Master_Log_Pos: 417
            Relay_Log_File: relay.000004
            Relay_Log_Pos: 624
        Relay_Master_Log_File: binlog.000002
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
    ...
    mysql> select * from testcp;
    +----+
    | id |
    +----+
    | 10 |
    | 20 |
    +----+  

    这个时候DUMP线程index file偏移指针如下:

    3、主库执行purge binary logs

    做之前拷贝原有binlog.000001为binlog.000001bak因为等会要拷贝回去

    mysql> purge binary logs to 'binlog.000002';
    Query OK, 0 rows affected (3.14 sec)
    
    mysql> show binary logs;
    +---------------+-----------+
    | Log_name   | File_size |
    +---------------+-----------+
    | binlog.000002 |    417 |
    +---------------+-----------+
    1 row in set (0.00 sec)
    

    因为purge binary logs命令会维护偏移指针,这个时候DUMP线程的index file偏移指针如下:

    4、手动更改

    将binlog.000001bak拷贝为binlog.000001,然后修改index file将binlog.000001加入回去,然后flush binary logs如下:

    mysql> flush binary logs;
    Query OK, 0 rows affected (0.15 sec)
    
    mysql> show binary logs;
    +---------------+-----------+
    | Log_name   | File_size |
    +---------------+-----------+
    | binlog.000001 |     198 |
    | binlog.000002 |    461 |
    +---------------+-----------+
    2 rows in set (0.00 sec)
    

    手动完成这个工作并不会维护DUMP线程的index file偏移指针,因此如下:

    就这样DUMP线程重新发送了一次binlog.000002的内容,POSITION的从库只能报错了如下:

    mysql> select * from replication_applier_status_by_worker \G
    *************************** 1. row ***************************
         CHANNEL_NAME: 
          WORKER_ID: 1
          THREAD_ID: NULL
        SERVICE_STATE: OFF
    LAST_SEEN_TRANSACTION: ANONYMOUS
      LAST_ERROR_NUMBER: 1062
      LAST_ERROR_MESSAGE: Worker 1 failed executing transaction 'ANONYMOUS' at master log binlog.000002, end_log_pos 386; Could not execute Write_rows event on table testmts.testcp; Duplicate entry '20' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000002, end_log_pos 386
     LAST_ERROR_TIMESTAMP: 2019-05-21 14:46:26

    八、总结

    这种操作非常不规范,如果实在要这么做考虑如下步骤:

    这种情况下将会重新初始化DUMP的偏移量指针,应该是没有问题的。还是尽量不要考虑这么做。

    以上就是MySQL手动注册binlog文件造成主从异常的原因的详细内容,更多关于MySQL 主从异常的资料请关注脚本之家其它相关文章!

    您可能感兴趣的文章:
    • MySQL 查看链接及杀掉异常链接的方法
    • MySQL数据库连接异常汇总(值得收藏)
    • mysql5.7.21启动异常的修复方法
    • mysql innodb 异常修复经验分享
    • MySQL定义异常和异常处理详解
    • MySQL存储过程中一些基本的异常处理教程
    • 分析一个MySQL的异常查询的案例
    • MySQL异常处理浅析
    • 分析MySQL抛出异常的几种常见解决方式
    上一篇:MySQL取消了Query Cache的原因
    下一篇:MySQL Like模糊查询速度太慢如何解决
  • 相关文章
  • 

    © 2016-2020 巨人网络通讯 版权所有

    《增值电信业务经营许可证》 苏ICP备15040257号-8

    MySQL手动注册binlog文件造成主从异常的原因 MySQL,手动,注册,binlog,文件,