1、Prometheus实现钉钉报警
1.1 Prometheus环境
# my global config
global:scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.# scrape_timeout is set to the global default (10s).# Alertmanager configuration
alerting:alertmanagers:- static_configs:- targets:- 192.168.204.195:9093# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:# - "first_rules.yml"# - "second_rules.yml"- "rule/*.yml" # A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: "prometheus"# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ["localhost:9090"]# 采集JVM监控数据- job_name: pushgatewaystatic_configs:- targets: ['192.168.204.195:9091']labels: instance: pushgateway
groups:
- name: node_rulerules:- alert: node memory usagesexpr: node_memory_usages > 20for: 10slabels:severity: highannotations:summary: "【监控告警】{{ $labels.exported_instance }}: 空间使用率异常"description: "【监控告警】{{ $labels.exported_instance }}: 空间使用率异常,请及时处理。"
启动情况:
1.2 pushgateway环境
启动情况:
1.3 自定义机器人并获取自定义机器人Webhook地址
1、首先创建一个群聊。
进入到钉钉软件的主页面后,点击右上角的加号按钮。
弹出加号里面的选项后,点击上面的发起群聊按钮。
进入到发起群聊界面后选择内部项目群,选择属于个人,点击上面的选择联系人选项。
进入到联系人界面后,选择要加入群聊的好友,最后点击右下角确定即可。
2、选择需要添加机器人的群聊,然后依次单击群设置 > 智能群助手 > 添加机器人。
3、点击添加机器人。
4、选择自定义。
5、点击添加。
6、输入相关信息,点击完成。
加签生成的随机码需要保存,后面会使用到。
7、点击完成。
这样我们就成功添加了自定义钉钉机器人并获取了 Webhook 地址。
获取到的Webhook的地址如下:
https://oapi.dingtalk.com/robot/send?access_token=57af98ce4cea66cb829df72c531efe093c6a254134ecf555f1
1.4 钉钉报警插件
访问github下载最新的插件(prometheus-webhook-dingtalk):
https://github.com/timonwong/prometheus-webhook-dingtalk/
这里下载 prometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gz
:
https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v2.1.0/prometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gz
上传到服务器进步解压:
$ tar -xvf prometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gz
修改配置文件:
$ vim config.example.yml
# 将内容修改为
# Targets, previously was known as "profiles"
targets:webhook1:url: https://oapi.dingtalk.com/robot/send?access_token=57af98ce4cea66cb829df72c531efe093c6a254134ecf555f1# secret for signaturesecret: SEC5d2ad4bd4cea26830145472cdd7c8dda5b8bea57a029f4f7db7524webhook_mention_users:url: https://oapi.dingtalk.com/robot/send?access_token=57af98ce4cea66cb829df72c531efe093c6a254134ecf555f1mention:mobiles: ['18210820213']
启动:
$ nohup ./prometheus-webhook-dingtalk --config.file="config.example.yml" >> nohup.out 2>&1 &
1.5 alertmanager环境
global:resolve_timeout: 5m
route:group_by: ['alertname']group_wait: 15sgroup_interval: 30srepeat_interval: 2mreceiver: 'web.hook'
receivers:- name: 'web.hook'webhook_configs:# prometheus-webhook-dingtalk的地址- url: 'http://192.168.204.195:8060/dingtalk/webhook1/send' send_resolved: true
inhibit_rules:- source_match:severity: 'critical'target_match:severity: 'warning'equal: ['alertname', 'dev', 'instance']
启动情况:
1.6 触发报警
触发告警前:
# 执行该脚本触发告警
cat <<EOF | curl --data-binary @- http://192.168.204.195:9091/metrics/job/test_job/instance/test_instance
node_memory_usages 36
node_memory_total 36000
EOF
触发告警后:
钉钉接收到的消息:
如果恢复告警也会收到信息:
至此钉钉告警完成。