本人在工作中一般喜欢把MySQL、Redis、Memcached、MongoDB等数据库按照实例的方式对外提供服务。一般都是一台高配的服务器上开启多个实例给每个业务使用。而监控是重中之重,我自己也尝试了多种监控方式,但对我来说感觉最简单最快的就是使用zabbix了,灵活定义key。
由于我是多实例,所以就需要用到zabbix的自动发现功能(LLD)。基本处理方式就是:
1、写自动发现脚本。
2、写状态取值脚本。
3、添加配置文件。
4、添加权限。
5、配置zabbix web。
一、写自动发现脚本
1 |
$ cat /etc/zabbix/zabbix_agentd.d/scripts/mysql_discovery.py |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
#!/usr/bin/env python import os import json t=os.popen("""sudo netstat -nltp|grep -w "mysqld"|grep -w "LISTEN"|grep -v grep|grep -v '^$'|awk -F: '{print $4}'""") s=os.popen("""sudo netstat -nltp|grep -w "mysqld"|grep -w "LISTEN"|grep -v grep|grep -v '^$'|awk -F: '{print $2}'|awk '{print $1}'""") port_info = [] ports = [] for port in t.readlines(): r = os.path.basename(port.strip()) if r: port_info.append(r) for port in s.readlines(): r = os.path.basename(port.strip()) if r: port_info.append(r) port_info = list(set(port_info)) for port in port_info: ports += [{'{#MYSQLPORT}':port}] print(json.dumps({'data':ports},sort_keys=True,indent=4,separators=(',',':'))) |
执行脚本看输出结果(最好使用zabbix用户执行,才能看出效果):
1 2 3 4 5 6 7 8 9 10 |
$ sudo -u zabbix /usr/bin/python /etc/zabbix/zabbix_agentd.d/scripts/mysql_discovery.py { "data":[ { "{#MYSQLPORT}":"3306" }, { "{#MYSQLPORT}":"3307" } } |
我这个脚本中使用了sudo权限,zabbix用户在执行netstat时需要sudo权限。
另外就是,不同环境可能netstat看到的形式不同,根据自己的环境做awk切割即可。我这里写了两种,如果你有其他方式追加就行了,然后做一个聚合操作。
二、写状态取值脚本
MASTER
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
#!/bin/bash # #Auth: Pengdongwen #Blog: www.ywnds.com #Desc: mysql status monitoring #dependent: # 1)python mysql_discovery.py # 2)grant select, replication slave, replication client on *.* to 'monitoruser'@'%' identified by '123456'; ######################### source /etc/profile MYSQL_HOST="localhost" MYSQL_USER="monitoruser" MYSQL_PWD="123456" if [ $# -lt 2 ];then echo "please set argument" exit 1 fi CMD="mysql -h${MYSQL_HOST} -P$1 -u${MYSQL_USER} -p${MYSQL_PWD}" result=`$CMD -e "show global status;" 2> /dev/null | grep -w "$2" | awk '{print $2}'` case $2 in Slaves_connected) if [ -z $result ];then echo 0; else echo $result; fi ;; Uptime) if [ -z $result ];then echo 0; else echo $result; fi ;; Slow_queries) if [ -z $result ];then echo 0; else echo $result; fi ;; Queries) if [ -z $result ];then echo 0; else echo $result; fi ;; Questions) if [ -z $result ];then echo 0; else echo $result; fi ;; Flush_commands) if [ -z $result ];then echo 0; else echo $result; fi ;; Open_tables) if [ -z $result ];then echo 0; else echo $result; fi ;; Threads_connected) if [ -z $result ];then echo 0; else echo $result; fi ;; Threads_cached) if [ -z $result ];then echo 0; else echo $result; fi ;; Threads_running) if [ -z $result ];then echo 0; else echo $result; fi ;; Threads_created) if [ -z $result ];then echo 0; else echo $result; fi ;; Max_used_connections) if [ -z $result ];then echo 0; else echo $result; fi ;; Com_update) if [ -z $result ];then echo 0; else echo $result; fi ;; Com_select) if [ -z $result ];then echo 0; else echo $result; fi ;; Com_rollback) if [ -z $result ];then echo 0; else echo $result; fi ;; Com_insert) if [ -z $result ];then echo 0; else echo $result; fi ;; Com_delete) if [ -z $result ];then echo 0; else echo $result; fi ;; Com_commit) if [ -z $result ];then echo 0; else echo $result; fi ;; Com_begin) if [ -z $result ];then echo 0; else echo $result; fi ;; Com_lock_tables) if [ -z $result ];then echo 0; else echo $result; fi ;; Table_locks_immediate) if [ -z $result ];then echo 0; else echo $result; fi ;; Bytes_sent) if [ -z $result ];then echo 0; else echo $result; fi ;; Bytes_received) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_buffer_pool_pages_data) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_buffer_pool_bytes_data) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_buffer_pool_pages_dirty) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_buffer_pool_pages_flushed) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_buffer_pool_pages_free) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_buffer_pool_pages_latched) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_buffer_pool_pages_misc) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_buffer_pool_pages_total) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_page_size) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_pages_created) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_pages_read) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_pages_written) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_row_lock_current_waits) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_row_lock_time) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_row_lock_time_avg) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_row_lock_time_max) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_rows_deleted) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_rows_inserted) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_rows_read) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_rows_updated) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_buffer_pool_write_requests) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_buffer_pool_read_requests) if [ -z $result ];then echo 0; else echo $result; fi ;; Waiting_lock) result=`$CMD -e "select state from information_schema.processlist;" 2> /dev/null | grep "Waiting.*lock" | wc -l` if [ -z $result ];then echo 0; else echo $result; fi ;; Transaction) result=`$CMD -e "select count(time) as count from information_schema.processlist where COMMAND!='Binlog Dump' and COMMAND!='Sleep' and COMMAND!='Connect' and Time>=30;" 2> /dev/null -BN` if [ -z $result ];then echo 0; else echo $result; fi ;; *) echo "Usage:$0 arguments" ;; esac |
SLAVE
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
#!/bin/bash # #Auth: Pengdongwen #Blog: www.ywnds.com #Desc: mysql slave status monitoring #dependent: # 1)python mysql_discovery.py # 2)grant select, super, replication slave, replication client on *.* to 'monitoruser'@'localhost' identified by '123456'; ######################### MYSQL_HOST="localhost" MYSQL_USER="monitoruser" MYSQL_PWD="123456" if [ $# -lt 2 ];then echo "please set argument" exit 1 fi CMD="mysql -h$MYSQL_HOST -P$1 -u$MYSQL_USER -p$MYSQL_PWD" result=`$CMD -e "show global status;" 2> /dev/null | grep -w "$2" | awk '{print $2}'` case $2 in Slaves_connected) if [ -z $result ];then echo 0; else echo $result; fi ;; Uptime) if [ -z $result ];then echo 0; else echo $result; fi ;; Slow_queries) if [ -z $result ];then echo 0; else echo $result; fi ;; Queries) if [ -z $result ];then echo 0; else echo $result; fi ;; Questions) if [ -z $result ];then echo 0; else echo $result; fi ;; Flush_commands) if [ -z $result ];then echo 0; else echo $result; fi ;; Open_tables) if [ -z $result ];then echo 0; else echo $result; fi ;; Threads_connected) if [ -z $result ];then echo 0; else echo $result; fi ;; Threads_cached) if [ -z $result ];then echo 0; else echo $result; fi ;; Threads_running) if [ -z $result ];then echo 0; else echo $result; fi ;; Threads_created) if [ -z $result ];then echo 0; else echo $result; fi ;; Max_used_connections) if [ -z $result ];then echo 0; else echo $result; fi ;; Com_update) if [ -z $result ];then echo 0; else echo $result; fi ;; Com_select) if [ -z $result ];then echo 0; else echo $result; fi ;; Com_rollback) if [ -z $result ];then echo 0; else echo $result; fi ;; Com_insert) if [ -z $result ];then echo 0; else echo $result; fi ;; Com_delete) if [ -z $result ];then echo 0; else echo $result; fi ;; Com_commit) if [ -z $result ];then echo 0; else echo $result; fi ;; Com_begin) if [ -z $result ];then echo 0; else echo $result; fi ;; Com_lock_tables) if [ -z $result ];then echo 0; else echo $result; fi ;; Max_used_connections) if [ -z $result ];then echo 0; else echo $result; fi ;; Bytes_sent) if [ -z $result ];then echo 0; else echo $result; fi ;; Bytes_received) if [ -z $result ];then echo 0; else echo $result; fi ;; Table_locks_immediate) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_buffer_pool_pages_data) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_buffer_pool_pages_dirty) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_buffer_pool_pages_flushed) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_buffer_pool_pages_free) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_buffer_pool_pages_latched) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_buffer_pool_pages_misc) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_buffer_pool_pages_total) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_page_size) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_pages_created) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_pages_read) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_pages_written) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_row_lock_current_waits) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_row_lock_time) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_row_lock_time_avg) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_row_lock_time_max) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_rows_deleted) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_rows_inserted) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_rows_read) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_rows_updated) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_buffer_pool_write_requests) if [ -z $result ];then echo 0; else echo $result; fi ;; Innodb_buffer_pool_read_requests) if [ -z $result ];then echo 0; else echo $result; fi ;; Slave_IO_Running) result=`$CMD -e "show slave status\G" 2> /dev/null | grep -w "Slave_IO_Running" | awk '{print $2}'` if [ -z "${result}" ] || [ "${result}" == "No" ];then echo 0; else echo 1; fi ;; Slave_SQL_Running) result=`$CMD -e "show slave status\G" 2> /dev/null | grep -w "Slave_SQL_Running" | awk '{print $2}'` if [ -z "${result}" ] || [ "${result}" == "No" ];then echo 0; else echo 1; fi ;; Seconds_Behind_Master) result=`$CMD -e "show slave status\G" 2> /dev/null | grep -w "Seconds_Behind_Master" | awk '{print $2}'` if [ -z $result ];then echo 0; else echo $result; fi ;; Auto_Position) result=`$CMD -e "show slave status\G" 2> /dev/null | grep -w "Auto_Position" | awk '{print $2}'` if [ -z $result ];then echo 0; else echo $result; fi ;; read_only) result=`$CMD -e "show global variables like 'read_only'\G" 2> /dev/null | grep -w "read_only" | awk '{print $2}'` if [ $result == "ON" ];then echo 1; else echo 0; fi ;; Waiting_lock) result=`$CMD -e "select state from information_schema.processlist;" 2> /dev/null | grep "Waiting.*lock" | wc -l` if [ -z $result ];then echo 0; else echo $result; fi ;; *) echo "Usage:$0 arguments" ;; esac |
脚本很简单,需要传给脚本两个参数,一个是端口号,另一个是监控值。
三、添加配置文件
1 2 3 4 |
$ cat /etc/zabbix/zabbix_agentd.d/userparameter_mysql.conf UserParameter=mysql.discovery[*],python /etc/zabbix/zabbix_agentd.d/scripts/mysql_discovery.py UserParameter=mysql[*],/bin/bash /etc/zabbix/zabbix_agentd.d/scripts/mysql_master_status.sh $1 $2 UserParameter=mysql.slave[*],/bin/bash /etc/zabbix/zabbix_agentd.d/scripts/mysql_slave_status.sh $1 $2 |
这里定义三个key,第一个key是用于自动发现的。第二个key是用于取不同实例(master)的状态值的,第三个key是用于取不同实例(slave)的状态值,传了两个参数,$1是端口号(从自动发现中获取的),第二个是传的参数。端口号和参数我会在zabbix页面配置传给mysql[*]这个key。
都配置完后就可以添加重启一下zabbix-agent了。
四、添加权限
上面说了,需要添加一个mysql_zabbix用户。
1 2 |
mysql> grant select, process, super, replication slave, replication client on *.* to 'mysql_zabbix'@'localhost' identified by '123456'; mysql> flush privileges; |
这几个权限是最低权限了,super是用来在从库执行”show slave stauts;”命令的。
需要给zabbix用户添加sudo权限。
1 2 3 4 |
$ cat /etc/sudoers.d/zabbix Defaults:zabbix !requiretty zabbix ALL=(ALL) NOPASSWD: SUPERVISORCTLZB Cmnd_Alias SUPERVISORCTLZB = /sbin/ss,/usr/sbin/ss,/sbin/dmidecode,/usr/sbin/dmidecode,/sbin/service,/usr/sbin/service,/bin/netstat |
另外需要注意的是,普通用户zabbix默认环境变量有如下这些:
1 2 |
$ echo $PATH /usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin |
所以你要确认你所有的执行程序都在这些路径下,不然zabbix是获取不到值的。看我的mysql客户端路径:
1 2 |
$ which mysql /usr/bin/mysql |
很多人喜欢把MySQL安装在其他路径,比如/usr/local/mysql下,然后使用export把执行路径追加到$PATH变量后,那么如果这样的话,zabbix就无法获取到值的,所以注意不要栽在这个上面。做一个软连接就可以解决了。
1 |
$ ln -sv /usr/local/mysql/bin/* /usr/local/bin/ |
使用zabbix用户执行看是否正常。
1 2 3 4 5 6 7 8 9 10 |
$ sudo -u zabbix `which zabbix_agentd` -t mysql.discovery[*] { "data":[ { "{#MYSQLPORT}":"3306" }, { "{#MYSQLPORT}":"3307" } } |
然后获取一个值,测试是否运行正常:
1 2 |
$ sudo -u zabbix `which zabbix_agentd` -t mysql[3306,Queries] mysql[3306,Queries] [t|79] |
如果自动发现没有问题,但是获取不到值,就要判断是否是相关命令的环境变量有问题,有一些环境变量路径zabbix。
五、配置zabbix web
前期工作都做完了,下面就可以配置zabbix web了。
首先创建一个模板(Template Linux MySQL Master Discovery),然后在模板中创建一个自动发现规则(Linux MySQL Discovery)。
在这个自动发现规则内创建多个item,我这里就给一个样例。
这个item包括Name定义、Key怎么定义、Type选项,Application定义,以及保存时间;最后一个比较重要的就是Store value,常用的有两种,一种是取出来的值是什么就存储为什么,第二种就是取每秒钟的差值(本次取值减去上次取值的差除以60,得到每秒钟的速率),这是因为我们从MySQL状态变量取来的值有些是累积值,利用zabbix这个功能就可以得到比如每秒钟的QPS/TPS等。也不是说每个值都需要这么取每秒速率,比如MySQL连接相关的状态变量就需要当前值。
创建完item后,接着就可以创建trigger了,比如下面我创建一个检查MySQL是否宕机的trigger。
更多关于zabbix的配置还是需要自己慢慢摸索。