进入"运维那点事"后，希望您第一件事就是阅读“关于”栏目，仔细阅读“关于Ctrl+c问题”，不希望误会！

> 闲聊 > ELK > ELK技术实战-导致Redis队列堆积的大日志条目

ELK技术实战-导致Redis队列堆积的大日志条目

ELK 彭东稳 8年前 (2016-06-28) 44979次浏览已收录 0个评论

提前说明，ELK是什么？Redis队列是什么？自行了解。

在使用ELK过程中，一般都是使用Redis或kafka做队列，用来缓存。基本架构图如下:

当然，我们的Redis开了多个是实例，用来存储不同的Key，因为日志量太大。

突然报警，说6382 Redis实例队列堆积，上机器看了一下paltform这个key，堆积如下：

root@shd:~# redis-cli -h 127.0.0.1 -p 6382 llen platform
(integer) 447241
root@shd:~# redis-cli -h 127.0.0.1 -p 6382 llen platform
(integer) 451739
root@shd:~# redis-cli -h 127.0.0.1 -p 6382 llen platform
(integer) 435235
root@shd:~# redis-cli -h 127.0.0.1 -p 6382 llen platform
(integer) 408159

1

2

3

4

5

6

7

8

root@shd:~# redis-cli -h 127.0.0.1 -p 6382 llen platform

(integer) 447241

root@shd:~# redis-cli -h 127.0.0.1 -p 6382 llen platform

(integer) 451739

root@shd:~# redis-cli -h 127.0.0.1 -p 6382 llen platform

(integer) 435235

root@shd:~# redis-cli -h 127.0.0.1 -p 6382 llen platform

(integer) 408159

然后看了一下此实例的QPS。

root@shd:~# redis-cli -h 127.0.0.1 -p 6382 info | grep ops
instantaneous_ops_per_sec:3327
root@shd:~# redis-cli -h 127.0.0.1 -p 6382 info | grep ops
instantaneous_ops_per_sec:5949
root@shd:~# redis-cli -h 127.0.0.1 -p 6382 info | grep ops
instantaneous_ops_per_sec:5882
root@shd:~# redis-cli -h 127.0.0.1 -p 6382 info | grep ops
instantaneous_ops_per_sec:3378
root@shd:~# redis-cli -h 127.0.0.1 -p 6382 info | grep ops
instantaneous_ops_per_sec:3368
root@shd:~# redis-cli -h 127.0.0.1 -p 6382 info | grep ops
instantaneous_ops_per_sec:19677
root@shd:~# redis-cli -h 127.0.0.1 -p 6382 info | grep ops
instantaneous_ops_per_sec:28329
root@shd:~# redis-cli -h 127.0.0.1 -p 6382 info | grep ops
instantaneous_ops_per_sec:20168
root@shd:~# redis-cli -h 127.0.0.1 -p 6382 info | grep ops
instantaneous_ops_per_sec:5181
root@shd:~# redis-cli -h 127.0.0.1 -p 6382 info | grep ops
instantaneous_ops_per_sec:4050
root@shd:~# redis-cli -h 127.0.0.1 -p 6382 info | grep ops
instantaneous_ops_per_sec:3311

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

root@shd:~# redis-cli -h 127.0.0.1 -p 6382 info | grep ops

instantaneous_ops_per_sec:3327

root@shd:~# redis-cli -h 127.0.0.1 -p 6382 info | grep ops

instantaneous_ops_per_sec:5949

root@shd:~# redis-cli -h 127.0.0.1 -p 6382 info | grep ops

instantaneous_ops_per_sec:5882

root@shd:~# redis-cli -h 127.0.0.1 -p 6382 info | grep ops

instantaneous_ops_per_sec:3378

root@shd:~# redis-cli -h 127.0.0.1 -p 6382 info | grep ops

instantaneous_ops_per_sec:3368

root@shd:~# redis-cli -h 127.0.0.1 -p 6382 info | grep ops

instantaneous_ops_per_sec:19677

root@shd:~# redis-cli -h 127.0.0.1 -p 6382 info | grep ops

instantaneous_ops_per_sec:28329

root@shd:~# redis-cli -h 127.0.0.1 -p 6382 info | grep ops

instantaneous_ops_per_sec:20168

root@shd:~# redis-cli -h 127.0.0.1 -p 6382 info | grep ops

instantaneous_ops_per_sec:5181

root@shd:~# redis-cli -h 127.0.0.1 -p 6382 info | grep ops

instantaneous_ops_per_sec:4050

root@shd:~# redis-cli -h 127.0.0.1 -p 6382 info | grep ops

instantaneous_ops_per_sec:3311

会发现QPS太低，一直都在3千到4千左右，偶尔会跑到2万到3万。然后又看了一下这个实例的资源使用情况，如下：

root@shd:~# top -n1 | grep redis
32393 root      20   0 1965m 1.9g 1408 S   6.5 12.0   4:43.13 redis-server

1 2	root@shd:~# top -n1 \| grep redis 32393 root 20 0 1965m 1.9g 1408 S 6.5 12.0 4:43.13 redis-server

可以发现内存用2G，CPU用6.5%并不高。然后又看了一下其他机器的QPS，基本都在几万，很平稳（Redis机器配置如下：16G 8核）。

root@shd:~# redis-cli -p 6381 info | grep ops
instantaneous_ops_per_sec:28005
root@shd:~# redis-cli -p 6381 info | grep ops
instantaneous_ops_per_sec:26667
root@shd:~# redis-cli -p 6381 info | grep ops
instantaneous_ops_per_sec:24701
root@shd:~# redis-cli -p 6381 info | grep ops
instantaneous_ops_per_sec:24585
root@shd:~# redis-cli -p 6381 info | grep ops
instantaneous_ops_per_sec:24317

1

2

3

4

5

6

7

8

9

10

root@shd:~# redis-cli -p 6381 info | grep ops

instantaneous_ops_per_sec:28005

root@shd:~# redis-cli -p 6381 info | grep ops

instantaneous_ops_per_sec:26667

root@shd:~# redis-cli -p 6381 info | grep ops

instantaneous_ops_per_sec:24701

root@shd:~# redis-cli -p 6381 info | grep ops

instantaneous_ops_per_sec:24585

root@shd:~# redis-cli -p 6381 info | grep ops

instantaneous_ops_per_sec:24317

然后看了一下此机器的带宽，如下：

ELK技术实战-导致Redis队列堆积的大日志条目

会发现，很奇葩，Incoming的带宽很稳定，一直在16MBit/s，而Outgoing的带宽很不稳定，波动太大。最大会跑到65MBit，平时都在2M左右。

根据以上这些条件，考虑了导致Redis队列堆积的原因有这么几个：

1）Redis这台机器的瓶颈，但查看资源使用也很低；

2）ELK中Logstash去redis中取数据的index太少，但我们开了15个index，所以不存在这个问题；

3）ELK打进Redis队列中的日志有非常大的条目，这个是同事说的；

后来去ELK上查看了使用paltform key的所有索引，然后就找到有一个索引有非常大的日志条目。当时挑了一条日志看了一下大小，为44k。

root@shd:~ # du -sh 1.txt 
48K	1.txt

1 2	root@shd:~ # du -sh 1.txt 48K 1.txt

问题找出来了，就去找开发，经过开发确认日志打错了，并做了修改。

如果您觉得本站对你有帮助，那么可以支付宝扫码捐助以帮助本站更好地发展，在此谢过。

关于作者：彭东稳

作者主页赞助作者

您必须登录才能发表评论！

版权声明
本站的文章和资源来自互联网或者站长
的原创，按照 CC BY -NC -SA 3.0 CN
协议发布和共享，转载或引用本站文章
应遵循相同协议。如果有侵犯版权的资
源请尽快联系站长，我们会在24h内删
除有争议的资源。
网站导航
- Linux
- Database
- Network
- Python
- 云计算
- 大数据
- 杂谈
- 前端
- 说说
- 关于
- 文章归档
- 网站标签
- 读者墙
- 留言板
推荐阅读
运维那点事
- 新浪微博
- 官方网站