MySQL Galera cluster crash
Привет,
Днес сутринта след рутинен update на пакетите на DB сървърите попаднах на MySQL Galera cluster crash.
За няколко минути cluster-а ми се счупи тотално и част от апликациите на root.bg не работиха.
Ето как започна всичко :
След
apt-get update && apt-get upgrade -y
нещата се объркаха :
Setting up mariadb-server-10.1 (10.1.22+maria-1~xenial) ... Job for mariadb.service failed because the control process exited with error code. See "systemctl status mariadb.service" and "journalctl -xe" for details. invoke-rc.d: initscript mysql, action "start" failed. ● mariadb.service - MariaDB database server Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/mariadb.service.d └─migrated-from-my.cnf-settings.conf Active: failed (Result: exit-code) since Thu 2017-03-16 10:35:22 EET; 4ms ago Process: 13607 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=1/FAILURE) Process: 13602 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS) Process: 13597 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=0/SUCCESS) Main PID: 19373 (code=exited, status=0/SUCCESS) Mar 16 10:35:22 border mysqld[13607]: Segmentation fault (core dumped)
Веднага предприех опит да възтановя базите чрез вградения скрипт на galera за recovery :
galera_recovery
обаче резултата не беше такъв какъвто се надявах да бъде :
root@border:/var/lib/mysql# galera_recovery 2017-03-16 10:40:53 139680740075776 [Warning] options --log-slow-admin-statements, --log-queries-not-using-indexes and --log-slow-slave-statements have no effect if --log_slow_queries is not set 2017-03-16 10:40:53 139680740075776 [Note] /usr/sbin/mysqld (mysqld 10.1.22-MariaDB-1~xenial) starting as process 19000 ... Segmentation fault (core dumped) WSREP: Failed to start mysqld for wsrep recovery: '2017-03-16 10:40:53 139680740075776 [Note] Using unique option prefix 'myisam-recover' is error-prone and can break in the future. Please use the full name 'myisam-recover-options' instead. 2017-03-16 10:40:53 139680740075776 [Note] InnoDB: Using mutexes to ref count buffer pool pages 2017-03-16 10:40:53 139680740075776 [Note] InnoDB: The InnoDB memory heap is disabled 2017-03-16 10:40:53 139680740075776 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2017-03-16 10:40:53 139680740075776 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier 2017-03-16 10:40:53 139680740075776 [Note] InnoDB: Compressed tables use zlib 1.2.8 2017-03-16 10:40:53 139680740075776 [Note] InnoDB: Using Linux native AIO 2017-03-16 10:40:53 139680740075776 [Note] InnoDB: Using SSE crc32 instructions 2017-03-16 10:40:53 139680740075776 [Note] InnoDB: Initializing buffer pool, size = 256.0M 2017-03-16 10:40:54 139680740075776 [Note] InnoDB: Completed initialization of buffer pool 2017-03-16 10:40:54 139680740075776 [Note] InnoDB: Highest supported file format is Barracuda. InnoDB: Transaction 7490983 was in the XA prepared state. InnoDB: 1 transaction(s) which must be rolled back or cleaned up InnoDB: in total 0 row operations to undo InnoDB: Trx id counter is 7503616 InnoDB: Starting in background the rollback of uncommitted transactions 2017-03-16 10:40:54 7f062bffe700 InnoDB: Rollback of non-prepared transactions completed 2017-03-16 10:40:54 139680740075776 [Note] InnoDB: 128 rollback segment(s) are active. 2017-03-16 10:40:54 139680740075776 [Note] InnoDB: Waiting for purge to start 2017-03-16 10:40:54 139680740075776 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.35-80.0 started; log sequence number 9123984658 2017-03-16 10:40:54 139680740075776 [Warning] InnoDB: Skipping buffer pool dump/restore during wsrep recovery. 2017-03-16 10:40:54 139680740075776 [Note] Plugin 'FEEDBACK' is disabled. 2017-03-16 10:40:54 139680740075776 [Note] Recovering after a crash using tc.log 2017-03-16 10:40:54 139680740075776 [Note] Starting crash recovery... 2017-03-16 10:40:54 7f09f4e4e900 InnoDB: Starting recovery for XA transactions... 2017-03-16 10:40:54 7f09f4e4e900 InnoDB: Transaction 7490983 in prepared state after recovery 2017-03-16 10:40:54 7f09f4e4e900 InnoDB: Transaction contains changes to 1 rows 2017-03-16 10:40:54 7f09f4e4e900 InnoDB: 1 transactions in prepared state after recovery 2017-03-16 10:40:54 139680740075776 [Note] Found 1 prepared transaction(s) in InnoDB 170316 10:40:54 [ERROR] mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.1.22-MariaDB-1~xenial key_buffer_size=20971520000 read_buffer_size=1048576 max_used_connections=0 max_threads=602 thread_count=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 21725239 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x30000 (my_addr_resolve failure: fork) /usr/sbin/mysqld(my_print_stacktrace+0x2e) [0x562db325cd9e] /usr/sbin/mysqld(handle_fatal_signal+0x2f5) [0x562db2daa2d5] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390) [0x7f09f34fa390] /usr/sbin/mysqld(wsrep_on+0x18) [0x562db2d44458] /usr/sbin/mysqld(+0x803842) [0x562db3016842] /usr/sbin/mysqld(+0x803f11) [0x562db3016f11] /usr/sbin/mysqld(+0x59897f) [0x562db2dab97f] /usr/sbin/mysqld(plugin_foreach_with_mask(THD*, char (*)(THD*, st_plugin_int*, void*), int, unsigned int, void*)+0x2cf) [0x562db2c2a6af] /usr/sbin/mysqld(ha_recover(st_hash*)+0xc6) [0x562db2dad886] /usr/sbin/mysqld(TC_LOG_MMAP::recover()+0x148) [0x562db2e5e9a8] /usr/sbin/mysqld(TC_LOG_MMAP::open(char const*)+0x620) [0x562db2e5f070] /usr/sbin/mysqld(+0x3766d5) [0x562db2b896d5] /usr/sbin/mysqld(mysqld_main(int, char**)+0xa52) [0x562db2b8d302] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f09f2ab5830] /usr/sbin/mysqld(_start+0x29) [0x562db2b81079] The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash.
Нещата започнаха да стават все по-лоши и по-лоши.. но накрая успях да овладея ситуацията.
Решението на проблема:
Благодарение на този пост в bugzilla.redhat.com успях да възтановя счупения DB сървър чрез:
mysqld_safe --user=mysql --datadir=/var/lib/mysql --wsrep-recover --tc-heuristic-recover rollback
Ето и резултата :
root@border:/var/lib/mysql# mysqld_safe --user=mysql --datadir=/var/lib/mysql --wsrep-recover --tc-heuristic-recover rollback 170316 10:55:06 mysqld_safe Logging to syslog. 170316 10:55:06 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
Така успях да стартирам cluster-a наново чрез
galera_new_cluster
и всичко си тръгна по старо му!
Изводите
Първи извод : Както не веднъж съм споменавал в предишни мои постове, винаги ама винаги трябва да имаме бекъп!! При положение, че всичките ми опити за recovery на galera не бяха довели до възтановяване на базата, щях да бъда принуден да стартирам нова mysql инстанция другаде и да налея всички бази в нея. Добре че не се стига до там! 🙂
Втори извод : Нещата с редовното обновяване на пакети не винаги носят облаги. Аз обичам да update-вам и често си патя от това – дали не е време да спра да го правя?