mysql大事物commit慢造成全庫堵塞問題

原創(chuàng)轉(zhuǎn)載請(qǐng)注明出處

本文使用引擎INNODB版本MySQL5.7.13
sync_binlog = 1
innodb_flush_log_at_trx_commit = 1
也就是雙1設(shè)置,

本文使用通過對(duì)mysql 進(jìn)行trace,gdb 自制工具infobin,以及
select * from sys.session where command<>'SLEEP' \G完成。 
這里只是涉及binlog的寫入問題和innodb層無關(guān)

1、問題由來
   在線上環(huán)境一次同事刪除了1000W的數(shù)據(jù),在commit的時(shí)候,整個(gè)數(shù)據(jù)庫
大約有幾十秒的全部hang住的情況,本文就是為了深入剖析這個(gè)問題。

2、可能需要的知識(shí)
http://blog.itpub.net/7728585/viewspace-2133589/ MYSQL 什么時(shí)候同步binlog文件以及它的臨時(shí)文件在哪里?
http://blog.itpub.net/7728585/viewspace-2133188/ 解析MYSQL BINLOG 二進(jìn)制格式(1)--準(zhǔn)備工作 
http://blog.itpub.net/7728585/viewspace-2133189/ 解析MYSQL BINLOG 二進(jìn)制格式(2)--FORMAT_DESCRIPTION_EVENT 
http://blog.itpub.net/7728585/viewspace-2133321/ 解析MYSQL BINLOG 二進(jìn)制格式(3)--QUERY_EVENT 
http://blog.itpub.net/7728585/viewspace-2133429/ 解析MYSQL BINLOG 二進(jìn)制格式(4)--TABLE_MAP_EVENT 
http://blog.itpub.net/7728585/viewspace-2133463/ 解析MYSQL BINLOG 二進(jìn)制格式(5)--WRITE_ROW_EVENT 
http://blog.itpub.net/7728585/viewspace-2133469/ 解析MYSQL BINLOG 二進(jìn)制格式(6)--UPDATE_ROW_EVENT/DELETE_ROW_EVENT  
http://blog.itpub.net/7728585/viewspace-2133502/ 解析MYSQL BINLOG 二進(jìn)制格式(7)--Xid_log_event/XID_EVENT 
http://blog.itpub.net/7728585/viewspace-2133506/ 解析MYSQL BINLOG二進(jìn)制格式(8)--GTID_LOG_EVENT/ANONYMOUS_GTID_LOG_EVENT及其他 
http://blog.itpub.net/7728585/viewspace-2133534/ 解析MYSQL BINLOG二進(jìn)制格式(9)--infobin解析binlog幫助文檔
http://blog.itpub.net/7728585/viewspace-2133537/ 解析MYSQL BINLOG二進(jìn)制格式(10)--問題解答 

3、簡單說明commit的時(shí)候做了什么以及為什么要上鎖
在<>一文中有如下的圖
mysql 大事物commit慢造成全庫堵塞問題



其實(shí)這幅圖中我已經(jīng)寫了需要一把鎖,但是沒仔細(xì)說,這把鎖影響的是全部當(dāng)前數(shù)據(jù)庫的需要寫binlog的全部語句,我們已經(jīng)知道對(duì)于大事物(超過)的commit期間需要將臨時(shí)
文件的中的binlog event 全部拷貝到binlog中,并且是一個(gè)事物的event必須是連續(xù)的,也就是一個(gè)事物的event中不能包含其他事物的event那么這期間需要一把mutex鎖,為什
么要這把鎖,這是在多線程/進(jìn)程并發(fā)對(duì)同一個(gè)文件進(jìn)行修改,也就是這里的binlog文件修改的,一種保護(hù)臨界資源的方法,如果沒有這把鎖就會(huì)出現(xiàn)binlog中的文件出現(xiàn)錯(cuò)亂的情況。
注意這個(gè)鎖是mysql層的而非innodb層。

4、模擬方法

對(duì)表test 進(jìn)行刪除不提交,觀察臨時(shí)文件大小。
關(guān)于生成臨時(shí)文件的原理參考:
(http://blog.itpub.net/7728585/viewspace-2133589/ MYSQL 什么時(shí)候同步binlog文件以及它的臨時(shí)文件在哪里?)

對(duì)刪除delete test表commit期間馬上對(duì)testsort和testsort2表插入一條數(shù)據(jù),
這個(gè)時(shí)候可以觀察到這兩個(gè)插入,會(huì)被堵塞知道delete test的事物提交完成
才完成。
mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
|  2621440 |
+----------+
1 row in set (3.14 sec)


及如下:

mysql 大事物commit慢造成全庫堵塞問題
可以看到一個(gè)簡單的insert居然花了10多秒,commit也是做了14秒。原因就在于需要將MLq9INFu這個(gè)臨時(shí)文件同步在binlog這個(gè)過程
其他需要寫入binlog的會(huì)話是堵塞的。


刪除期間會(huì)看到臨時(shí)文件不斷增大如下:
[root@testmy ~]# lsof |grep /tmp/ML
mysqld    2095     mysql   84u      REG                8,3  328560640   14680133 /tmp/MLq9INFu (deleted)
[root@testmy ~]# lsof |grep /tmp/ML
mysqld    2095     mysql   84u      REG                8,3  346337280   14680133 /tmp/MLq9INFu (deleted)
[root@testmy ~]# lsof |grep /tmp/ML
mysqld    2095     mysql   84u      REG                8,3  367812608   14680133 /tmp/MLq9INFu (deleted)

到了commit后這個(gè)文件變?yōu)?,但是任然存在因?yàn)檫@個(gè)會(huì)話線程還在:
[root@testmy ~]# lsof |grep /tmp/ML
mysqld    2095     mysql   84u      REG                8,3          0   14680133 /tmp/MLq9INFu (deleted)


5、使用infobin 對(duì)binlog文件中進(jìn)行驗(yàn)證

使用./infobin test.000211>log.log 進(jìn)行分析
(可以使用mysqlbinlog但是不直觀
工具可以參考
http://blog.itpub.net/7728585/viewspace-2133534/ 解析MYSQL BINLOG二進(jìn)制格式(9)--infobin解析binlog幫助文檔
獲得)

-----delete from test; 的binlog event

>Gtid Event:Pos:368030739(0X15efb413) N_pos:368030804(0X15efb454) Time:1487197597 Event_size:65(bytes)
Gtid:4a6f2a67-5d87-11e6-a6bd-0c29a879a3:1158184
-->Query Event:Pos:368030804(0X15efb454) N_Pos:368030876(0X15efb49c) Time:1487197355 Event_size:72(bytes)
Exe_time:0  Use_db:test Statment(35b-trun):BEGIN /*!Trx begin!*/ Gno:1158184
---->Map Event:Pos368030876(0X15efb49c) N_pos:368031047(0X15efb547) Time:1487197355 Event_size:171(bytes)
TABLE_ID:129 DB_NAME:test TABLE_NAME:test Gno:1158184
------>Delete Event:Pos:368031047(0X15efb547) N_pos:368039196(0X15efd51c) Time:1487197355 Event_size:8149(bytes)
Dml on table: test.test  table_id:129 Gno:1158184
....(中間省略)
------>Delete Event:Pos:736052116(0X2bdf4394) N_pos:736054135(0X2bdf4b77) Time:1487197355 Event_size:2019(bytes) 
Dml on table: test.test  table_id:129 Gno:1158184 
>Xid Event:Pos:736054135(0X2bdf4b77) N_Pos:736054166(0X2bdf4b96) Time:1487197597 Event_size:31(bytes) 
COMMIT; /*!Trx end*/ Gno:1158184


-----insert into testsort values(1);的binlog

>Gtid Event:Pos:736054166(0X2bdf4b96) N_pos:736054231(0X2bdf4bd7) Time:1487197599 Event_size:65(bytes) 
Gtid:4a6f2a67-5d87-11e6-a6bd-0c29a879a3:1158185
-->Query Event:Pos:736054231(0X2bdf4bd7) N_Pos:736054303(0X2bdf4c1f) Time:1487197599 Event_size:72(bytes) 
Exe_time:0  Use_db:test Statment(35b-trun):BEGIN /*!Trx begin!*/ Gno:1158185
---->Map Event:Pos736054303(0X2bdf4c1f) N_pos:736054358(0X2bdf4c56) Time:1487197599 Event_size:55(bytes) 
TABLE_ID:187 DB_NAME:test TABLE_NAME:testsort Gno:1158185
------>Insert Event:Pos:736054358(0X2bdf4c56) N_pos:736054410(0X2bdf4c8a) Time:1487197599 Event_size:52(bytes) 
Dml on table: test.testsort  table_id:187 Gno:1158185 
>Xid Event:Pos:736054410(0X2bdf4c8a) N_Pos:736054441(0X2bdf4ca9) Time:1487197599 Event_size:31(bytes) 
COMMIT; /*!Trx end*/ Gno:1158185


------insert into testsort2 values(1);的binlog

>Gtid Event:Pos:736054441(0X2bdf4ca9) N_pos:736054506(0X2bdf4cea) Time:1487197600 Event_size:65(bytes) 
Gtid:4a6f2a67-5d87-11e6-a6bd-0c29a879a3:1158186
-->Query Event:Pos:736054506(0X2bdf4cea) N_Pos:736054578(0X2bdf4d32) Time:1487197600 Event_size:72(bytes) 
Exe_time:0  Use_db:test Statment(35b-trun):BEGIN /*!Trx begin!*/ Gno:1158186
---->Map Event:Pos736054578(0X2bdf4d32) N_pos:736054634(0X2bdf4d6a) Time:1487197600 Event_size:56(bytes) 
TABLE_ID:188 DB_NAME:test TABLE_NAME:testsort2 Gno:1158186
------>Insert Event:Pos:736054634(0X2bdf4d6a) N_pos:736054686(0X2bdf4d9e) Time:1487197600 Event_size:52(bytes) 
Dml on table: test.testsort2  table_id:188 Gno:1158186 
>Xid Event:Pos:736054686(0X2bdf4d9e) N_Pos:736054717(0X2bdf4dbd) Time:1487197600 Event_size:31(bytes) 
COMMIT; /*!Trx end*/ Gno:1158186

可以看到確實(shí)insert into testsort values(1)和insert into testsort2 values(1);的event確實(shí)在delete后才進(jìn)入
binlog file

6、分析等待點(diǎn)。
那么到底在等待什么呢?這是另外一次測(cè)試的sys.session輸出

mysql> select * from sys.session where command<>'SLEEP' \G
*************************** 1. row ***************************
                thd_id: 28
               conn_id: 3
                  user: root@localhost
                    db: test
               command: Query
                 state: starting
                  time: 13
     current_statement: commit
     statement_latency: 12.55 s
              progress: NULL
          lock_latency: 0 ps
         rows_examined: 0
             rows_sent: 0
         rows_affected: 0
            tmp_tables: 0
       tmp_disk_tables: 0
             full_scan: NO
        last_statement: NULL
last_statement_latency: NULL
        current_memory: 179.02 KiB
             last_wait: wait/io/file/sql/binlog
     last_wait_latency: Still Waiting
                source: binlog.cc:8469
           trx_latency: 3.22 m
             trx_state: ACTIVE
        trx_autocommit: NO
                   pid: 5821
          program_name: mysql
*************************** 2. row ***************************
                thd_id: 3031
               conn_id: 3006
                  user: root@localhost
                    db: test
               command: Query
                 state: query end
                  time: 12
     current_statement: insert into testsort3(id1,id2, ... OR((RAND()*100000)),'gaopeng')
     statement_latency: 11.82 s
              progress: NULL
          lock_latency: 29.18 ms
         rows_examined: 0
             rows_sent: 0
         rows_affected: 0
            tmp_tables: 0
       tmp_disk_tables: 0
             full_scan: NO
        last_statement: NULL
last_statement_latency: NULL
        current_memory: 556.53 KiB
             last_wait: wait/synch/cond/sql/MYSQL_BIN_LOG::COND_done
     last_wait_latency: Still Waiting
                source: binlog.cc:1894
           trx_latency: 12.97 s
             trx_state: ACTIVE
        trx_autocommit: YES
                   pid: 15312
          program_name: mysql
          
不難看出current_statement: commit 等待是
last_wait: wait/io/file/sql/binlog
last_wait_latency: Still Waiting
source: binlog.cc:8469
而被堵塞的insert等待是
last_wait: wait/synch/cond/sql/MYSQL_BIN_LOG::COND_done
last_wait_latency: Still Waiting
source: binlog.cc:1894

這是commit 等待的binlog.cc:8469的內(nèi)容
    /**
      On *pure non-transactional* workloads there is a small window
      in time where a concurrent rotate might be able to close
      the file before the sync is actually done. In that case,
      ignore the bad file descriptor errors.


      Transactional workloads (InnoDB) are not affected since the
      the rotation will not happen until all transactions have
      committed to the storage engine, thence decreased the XID
      counters.(重點(diǎn))

      TODO: fix this properly even for non-transactional storage
            engines.
     */
    if (DBUG_EVALUATE_IF("simulate_error_during_sync_binlog_file", 1,
        mysql_file_sync(log_file.file,
        MYF(MY_WME | MY_IGNORE_BADFD))))
從源碼注釋上可以看出原因再來看看他掉用int my_sync(File fd, myf my_flags)
既然加入文件描述符fd,應(yīng)該就是在寫binlog file


下面是insert等待的source: binlog.cc:1894的內(nèi)容
      mysql_cond_wait(&m_cond_done, &m_lock_done);
這個(gè)地方就是拿不到這把鎖,他是Stage_manager::enroll_for調(diào)用的函數(shù)
而 Stage_manager源碼注釋是:
/**
  Class for maintaining the commit stages for binary log group commit.
 */
而enroll_for方法的描述為:
  /**
    Enroll a set of sessions for a stage.
    This will queue the session thread for writing and flushing.
    If the thread being queued is assigned as stage leader, it will
    return immediately.
    If wait_if_follower is true the thread is not the stage leader,
    the thread will be wait for the queue to be processed by the
    leader before it returns.(重點(diǎn))
    In DBUG-ON version the follower marks is preempt status as ready.
    @param stage Stage identifier for the queue to append to.
    @param first Queue to append.
    @param stage_mutex
                 Pointer to the currently held stage mutex, or NULL if
                 we're not in a stage.
    @retval true  Thread is stage leader.
    @retval false Thread was not stage leader and processing has been done.
   */
 
那么確實(shí)commit后binary log確實(shí)有一個(gè)寫入的策略,會(huì)根據(jù)是否為leader進(jìn)行寫入,
如果不是leader則等到leader全部寫完才進(jìn)行寫入,多個(gè)等待寫入binlog的會(huì)話計(jì)入
一個(gè)隊(duì)列中,這貌似就是并行了。

我使用GDB將斷點(diǎn)打到
Stage_manager::enroll_for 
MYSQL_BIN_LOG::sync_binlog_file
發(fā)現(xiàn)整個(gè)流程如下:
              
A會(huì)話COMMIT-->拿到鎖-->進(jìn)行binlog寫-->commit完成
B會(huì)話COMMIT-->等待鎖--------------------------->拿到鎖-->進(jìn)行binlog寫-->commit完成                                               


7、如何避免

沒有什么好的辦法如果使用innodb雙1的情況下,只能每次修改少量的數(shù)據(jù),增加修改次數(shù)
前面也說了binlog生成量大約為你修改數(shù)據(jù)量的2/3如果update是4/3那么就要評(píng)估你的I/O
性能,如果順序?qū)懭朊棵雽懭霝閄需要修改的數(shù)據(jù)量大約為Y
那么
S(秒)=(Y*(2/3))/(X/2)
控制S到一個(gè)你認(rèn)為可以接受的范圍內(nèi)。其實(shí)可以通過copy來試試你的順序的I/O性能。

比如我們寫這樣一個(gè)存儲(chǔ)過程:
DELIMITER //  
CREATE PROCEDURE d_move(in  beid int,in cunt int,in  rws int,in maxid int)
begin 
  declare num int;
  declare nowid int; 
set num = 1; 
set nowid = beid;
while num <= cunt do
  insert into test_vest_hist select * from test_vest where id>=nowid and id<nowid+rws and="" id<nowid+rws="" id<maxid;=""  ="" select="" concat(now(),'="" ',row_count(),'="" insert!');    delete  from test_vest where id>=nowid and id<nowid+rws and="" id<maxid;=""  ="" select="" concat(now(),'="" ',row_count(),'="" delete!');    select concat(now(),' ',num,' times delete finsh!');
  set nowid =  nowid+rws;
  set num = num+1;
  select sleep(2);
end while;
end //
將rws控制到一個(gè)可以接受的范圍比如5W,當(dāng)然最好進(jìn)行sleep(),這樣給innodb引擎一個(gè)刷入臟數(shù)據(jù)的時(shí)間減少壓力。

如果sync_log不設(shè)置為1,這個(gè)我做了trace確實(shí)就不會(huì)在commit的進(jìn)行binlog寫入了,但是這是不安全的,commit結(jié)束
后日志不落地是不安全的。

 作者微信:

               mysql 大事物commit慢造成全庫堵塞問題

分享標(biāo)題:mysql大事物commit慢造成全庫堵塞問題
分享路徑:http://bm7419.com/article32/pcddpc.html

成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供網(wǎng)站設(shè)計(jì)公司、全網(wǎng)營銷推廣、微信小程序云服務(wù)器、網(wǎng)站內(nèi)鏈、移動(dòng)網(wǎng)站建設(shè)

廣告

聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請(qǐng)盡快告知,我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如需處理請(qǐng)聯(lián)系客服。電話:028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時(shí)需注明來源: 創(chuàng)新互聯(lián)

營銷型網(wǎng)站建設(shè)