1 概述
线上站点普遍是https,因此监控https web站点的证书的过期时间,是一个基础性需求。例如,证书过期会导致tls握手失败,进而导致用户无法正常访问web站点。
blackbox-expoter是一个web服务,它暴露了一个接口,访问这个接口能使得它去访问目标站点,并向客户端响应相关的web站点指标信息。prometheus和black-expoter结合使用,可以监控https web站点的响应时间、证书过期时间等。
2 blackbox-expoter
2.1 指标接口
格式:
GET /probe?module=模块名&target=<网址>
例子:
GET /probe?module=http_get_2xx&target=https://www.baidu.com
2.2 部署
blackbox-exporter的配置中定义了多种模块,例如ping,http_get_2xx等,模块名称可以自行定义。
apiVersion: v1
kind: Namespace
metadata:name: monitoring---apiVersion: v1
kind: Service
metadata:name: blackbox-exporternamespace: monitoringlabels:k8s-app: blackbox-exporter
spec:type: ClusterIPports:- name: httpport: 9115targetPort: 9115selector:k8s-app: blackbox-exporter---apiVersion: apps/v1
kind: Deployment
metadata:name: blackbox-exporternamespace: monitoringlabels:k8s-app: blackbox-exporter
spec:replicas: 1selector:matchLabels:k8s-app: blackbox-exportertemplate:metadata:labels:k8s-app: blackbox-exporterspec:containers:- name: blackbox-exporterimage: prom/blackbox-exporter:latestargs:- --config.file=/etc/blackbox_exporter/blackbox.yml- --web.listen-address=:9115- --log.level=infoports:- name: httpcontainerPort: 9115resources:limits:cpu: 200mmemory: 256Mirequests:cpu: 100mmemory: 50MilivenessProbe:tcpSocket:port: 9115initialDelaySeconds: 5timeoutSeconds: 5periodSeconds: 10successThreshold: 1failureThreshold: 3readinessProbe:tcpSocket:port: 9115initialDelaySeconds: 5timeoutSeconds: 5periodSeconds: 10successThreshold: 1failureThreshold: 3volumeMounts:- name: configmountPath: /etc/blackbox_exportervolumes:- name: configconfigMap:name: blackbox-exporter---apiVersion: v1
kind: ConfigMap
metadata:name: blackbox-exporternamespace: monitoringlabels:app: blackbox-exporter
data:blackbox.yml: |-modules:## ----------- TCP 检测模块配置 -----------tcp_connect:prober: tcptimeout: 5s## ----------- ICMP 检测配置 -----------ping:prober: icmptimeout: 5sicmp:preferred_ip_protocol: "ip4"## ----------- HTTP GET 2xx 检测模块配置 -----------http_get_2xx: prober: httptimeout: 10shttp:method: GETpreferred_ip_protocol: "ip4"valid_http_versions: ["HTTP/1.1","HTTP/2"]valid_status_codes: [200] # 验证的HTTP状态码,默认为2xxno_follow_redirects: false # 是否不跟随重定向## ----------- HTTP GET 3xx 检测模块配置 -----------http_get_3xx: prober: httptimeout: 10shttp:method: GETpreferred_ip_protocol: "ip4"valid_http_versions: ["HTTP/1.1","HTTP/2"]valid_status_codes: [301,302,304,305,306,307] # 验证的HTTP状态码,默认为2xxno_follow_redirects: false # 是否不跟随重定向## ----------- HTTP POST 监测模块 -----------http_post_2xx: prober: httptimeout: 10shttp:method: POSTpreferred_ip_protocol: "ip4"valid_http_versions: ["HTTP/1.1", "HTTP/2"]#headers: # HTTP头设置# Content-Type: application/json#body: '{}' # 请求体设置
3 部署prometheus
apiVersion: v1
kind: Namespace
metadata:name: monitoring---
apiVersion: v1
kind: ServiceAccount
metadata:name: prometheus-appnamespace: monitoring---
apiVersion: apps/v1
kind: Deployment
metadata:labels:app: prometheus-appname: prometheus-appnamespace: monitoring
spec:replicas: 1selector:matchLabels:app: prometheus-apptemplate:metadata:labels:app: prometheus-appname: prometheus-appspec:containers:- args:- --config.file=/etc/prometheus/prometheus.yml- --storage.tsdb.retention=7d- --web.enable-lifecycle- --log.level=debugimage: prom/prometheus:v2.31.0imagePullPolicy: IfNotPresentname: prometheusports:- containerPort: 9090name: webprotocol: TCPvolumeMounts:- mountPath: /etc/prometheusname: config-volume- mountPath: /etc/prometheus/etc.dname: blackbox-web-targetdnsPolicy: ClusterFirstrestartPolicy: AlwaysserviceAccount: prometheus-appserviceAccountName: prometheus-appvolumes:- configMap:name: prometheus-appname: config-volume- configMap:name: blackbox-web-targetname: blackbox-web-target---apiVersion: v1
kind: Service
metadata:labels:app: prometheus-appname: prometheus-appname: prometheus-appnamespace: monitoring
spec:ports:- name: httpport: 9090protocol: TCPtargetPort: 9090selector:app: prometheus-appsessionAffinity: Nonetype: ClusterIP---apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:name: prometheus
rules:
- apiGroups:- ""resources:- nodes- nodes/proxy- services- endpoints- podsverbs:- get- list- watch
- apiGroups:- ""resources:- configmapsverbs:- get
- nonResourceURLs:- /metricsverbs:- get---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:annotations:name: prometheus
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: prometheus
subjects:
- kind: ServiceAccountname: prometheus-appnamespace: monitoring---
apiVersion: v1
data:prometheus.yml: |-global:scrape_interval: 15sscrape_configs:- job_name: blackboxmetrics_path: /probeparams:module: [http_get_2xx] # 会变成http的参数:module=http_get_2xxfile_sd_configs: - files: - '/etc/prometheus/etc.d/web.yml' # 被监控的目标站点是写在此文件refresh_interval: 30s # 30秒热更新一次,不必重启prometheusrelabel_configs:- source_labels: [__address__]target_label: __param_target # 会变成http的参数:target=目标url- source_labels: [__param_target]target_label: instance- target_label: __address__replacement: blackbox-exporter.monitoring.svc.cluster.local:9115
kind: ConfigMap
metadata:name: prometheus-appnamespace: monitoring---
apiVersion: v1
kind: ConfigMap
metadata:name: blackbox-web-targetnamespace: monitoringlabels:app: blackbox-exporter
data:web.yml: |----- targets:- https://www.baidu.com # 被监控的站点labels:env: prodapp: baidu-webproject: baidudesc: desc for baidu web- targets:- https://blog.csdn.net # 被监控的站点labels:env: prodapp: csdn-webproject: csdndesc: desc for csdn
4 promethues界面效果
指标probe_ssl_earliest_cert_expiry表示证书的过期时间的时间戳,那么以下公式表示多少秒后证书过期:
probe_ssl_earliest_cert_expiry - time()
5 grafana
5.1 部署
apiVersion: apps/v1
kind: Deployment
metadata:name: grafananamespace: monitoringlabels:app: grafana
spec:replicas: 1selector:matchLabels:app: grafanatemplate:metadata:labels:app: grafanaspec:containers: - name: grafanaimage: grafana/grafanaresources:limits:memory: "128Mi"cpu: "50m"readinessProbe:httpGet:path: /api/healthport: 3000initialDelaySeconds: 15periodSeconds: 10livenessProbe:tcpSocket:port: 3000initialDelaySeconds: 15periodSeconds: 10ports:- containerPort: 3000
---
apiVersion: v1
kind: Service
metadata:name: grafananamespace: monitoring
spec:selector:app: grafanatype: NodePortports:- protocol: TCPport: 3000
5.2 配置数据源
添加prometheus数据源,prometheus实例在kubernetes中的service名称为prometheus-app,因此使用http://prometheus-app:9090作为地址即可。
5.3 导入模板
使用编号为13230的grafana模板。
6 小结
prometheus和blackbox-exporter一起协同监控web站点,blackbox-exporter作为一个中间层解耦prometheus和目标web站点,blackbox-exporter是真正去获取目标web站点证书并暴露metrics的服务,prometheus只需要抓取blackbox-exporter暴露的指标即可。