需求描述
Categraf作为夜莺监控平台的数据采集工具,为了保障Linux主机的安全,需要实现对系统用户密码有效期的监控,并在密码即将到期时及时告警,以提醒运维人员更改密码。本章将详细介绍如何利用Categraf的exec插件来实现这一功能,并确保告警信息能够通过企业微信、飞书等渠道准确地推送给相关运维人员。
exec插件exec.toml文件配置
这个配置文件定义了exec插件定期执行/opt/categraf/scripts/check_password_expiry.shjiao脚本文件,并且输出的数据格式为influx格式。
# # collect interval# interval = 15[[instances]]# # commands, support globcommands = ["/opt/categraf/scripts/check_password_expiry.sh"]# # timeout for each command to complete# timeout = 5# # interval = global.interval * interval_times# interval_times = 1# # choices: influx prometheus falcon# # influx stdout example: mesurement,labelkey1=labelval1,labelkey2=labelval2 field1=1.2,field2=2.3data_format = "influx"
influx格式及格式说明:
mesurement,labelkey1=labelval1,labelkey2=labelval2 field1=1.2,field2=2.3
mesurement,定义指标名称(或者前缀),比如 connections;
mesurement后面是逗号,逗号后面是标签,如果没有标签,则mesurement后面不需要逗号
标签是k=v的格式,多个标签用逗号分隔,比如region=beijing,env=test
标签之后是空格
空格之后是属性字段,多个属性字段用逗号分隔
属性字段是字段名=值的格式,在categraf里值只能是数字
最终,mesurement和各个属性字段名称拼接成metric名字
监控Shell脚本check_password_expiry.sh
#!/bin/bash# 定义需要检查的用户名数组users=("app" "root" "weihu" "mysql" "nginx")# 循环处理每个用户名for user in "${users[@]}"do# 设置 LANG 环境变量以确保 chage -l 的输出为英文export LANG=en_US.UTF-8# 获取密码过期时间,并去除前后空格EXPIRY_DATE_RAW=$(chage -l $user | grep "Password expires")EXPIRY_DATE=$(echo "$EXPIRY_DATE_RAW" | awk -F: '{print $2}' | awk '{$1=$1};1')# 检查是否密码永不过期if [[ "$EXPIRY_DATE" =~ ^(never|从不)$ ]]; thenEXPIRY_DATE_TS=99999 # 使用一个很大的数字表示永远不会过期的时间戳EXPIRY_DATE_FORMATTED="99999" # 使用一个很大的日期来表示永不过期DAYS_LEFT=99999 # 表示永不过期else# 将过期日期转换为时间戳EXPIRY_DATE_TS=$(date --date="$EXPIRY_DATE" +%s 2>/dev/null)# 获取今天的日期时间戳TODAY_TS=$(date +%s)# 计算剩余过期天数DAYS_LEFT=$(( (EXPIRY_DATE_TS - TODAY_TS) / 86400 ))# 将过期日期转换为 yyyymmdd 格式EXPIRY_DATE_FORMATTED=$(date --date="$EXPIRY_DATE" "+%Y%m%d" 2>/dev/null)fi# 清除 LANG 环境变量以恢复之前的设置unset LANG# 输出符合 InfluxDB line protocol 的格式echo "password_expiry,account=$user,password_expires_time=$EXPIRY_DATE_FORMATTED days_until_expiry=$DAYS_LEFT"done
注意
:
脚本执行输出结果一定要满足前面exec.toml配置文件中定义的data_format = "influx"数据格式,这样categraf截获的stdout内容,才能成功解析并传给服务端,上述脚本执行输出如下:
[root@localhost categraf]# ./categraf --test --inputs exec
......
18:44:10 password_expiry_days_until_expiry account=app agent_hostname=localhost password_expires_time=20241026 6
18:44:10 password_expiry_days_until_expiry account=root agent_hostname=localhost password_expires_time=99999 99999
18:44:10 password_expiry_days_until_expiry account=weihu agent_hostname=localhost password_expires_time=99999 99999
18:44:10 password_expiry_days_until_expiry account=mysql agent_hostname=localhost password_expires_time=99999 99999
18:44:10 password_expiry_days_until_expiry account=nginx agent_hostname=localhost password_expires_time=99999 99999
......
监控策略规则usermanager.json
上述测试确认数据及格式无误后在夜莺监控平台配置关于Linux系统用户密码有效期的监控大盘,直接导入如下json内容,完成监控策略配置。
{"name": "LInux系统账号密码有效期检查","tags": "usermanager","ident": "","configs": {"var": [{"name": "prom","label": "数据源","type": "datasource","definition": "prometheus","defaultValue": ""},{"name": "user","label": "用户","type": "query","datasource": {"cate": "prometheus","value": 1},"definition": "label_values(account)"}],"panels": [{"type": "table","id": "2d96fa01-57a2-4ba1-b1a2-8369c3bf34f2","layout": {"h": 12,"w": 24,"x": 0,"y": 0,"i": "2d96fa01-57a2-4ba1-b1a2-8369c3bf34f2","isResizable": true},"version": "3.0.0","datasourceCate": "prometheus","datasourceValue": 1,"targets": [{"refId": "A","expr": "password_expiry_days_until_expiry","legend": "","time": {"start": "now-1m","end": "now"},"instant": false}],"transformations": [{"id": "organize","options": {"excludeByName": {"__name__": true,"value": false,"password_expires_on": true,"password_expires_time": false,"account": false,"ident": false},"renameByName": {"account": "系统用户","ident": "主机节点","password_expires_on": "","value": "密码过期剩余天数","password_expires_time": "密码过期时间"},"indexByName": {"ident": 0,"account": 1,"password_expires_time": 2,"value": 3}}}],"name": "系统用户密码过期检查","maxPerRow": 4,"custom": {"showHeader": true,"colorMode": "value","calc": "last","displayMode": "labelsOfSeriesToRows","columns": ["ident","account","password_expires_time","value"],"sortColumn": "value","sortOrder": "ascend","linkMode": "appendLinkColumn"},"options": {"valueMappings": [{"type": "special","result": {"color": "#000000","text": "never"},"match": {"special": 99999}},{"type": "range","result": {"color": "rgba(253, 0, 0, 1)"},"match": {"from": -1000,"to": 15}}],"standardOptions": {"util": "none"}},"overrides": [{"matcher": {"id": "byName","value": "password_expires_time"},"properties": {"valueMappings": [{"type": "special","result": {"color": "#000000","text": "never"},"match": {"special": 99999}}],"standardOptions": {"util": "none"}}}]}],"version": "3.0.0","graphTooltip": "default","graphZoom": "default"}}
告警策略规则alertrule.json
在夜莺监控平台配置关于Linux系统用户密码有效期的告警策略(在密码过期前7天通过企业微信、飞书渠道每24小时推送告警提醒信息),直接导入如下json内容,完成告警策略配置。
[{"cate": "prometheus","datasource_ids": [0],"name": "Linux系统账号过期告警提醒","note": "你的主机系统账号 {{$labels.account}} 即将过期,请及时修改密码!!!","prod": "metric","algorithm": "","algo_params": null,"delay": 0,"severity": 0,"severities": [3],"disabled": 0,"prom_for_duration": 60,"prom_ql": "","rule_config": {"queries": [{"keys": {"labelKey": "","valueKey": ""},"prom_ql": "password_expiry_days_until_expiry<7","severity": 3}]},"prom_eval_interval": 30,"enable_stime": "00:00","enable_stimes": ["00:00"],"enable_etime": "00:00","enable_etimes": ["00:00"],"enable_days_of_week": ["0","1","2","3","4","5","6"],"enable_days_of_weeks": [["0","1","2","3","4","5","6"]],"enable_in_bg": 0,"notify_recovered": 1,"notify_channels": ["wecom","feishu"],"notify_repeat_step": 1440,"notify_max_number": 0,"recover_duration": 0,"callbacks": [],"runbook_url": "","append_tags": [],"annotations": {},"extra_config": null}]
效果展示
监控结果展示
告警推送结果展示
【❌测试平台-告警❌】级别状态: S3规则名称: Linux系统账号不足7天过期告警提醒规则备注: 你的主机系统账号 app 即将过期,请及时修改密码!!!告警主机: localhost触发时间: 2024-10-19 14:17:34触发时值: 7发送时间: 2024-10-19 14:17:35
个人观点,仅供参考
原创 北极星001 运维记事