问题
书接上回,对EKS(AWS云k8s)启用AMP(AWS云Prometheus)监控+AMG(AWS云 grafana),上次我们只是配通了EKS+AMP+AMG的监控路径。这次使用一位大卫老师的grafana的面板,具体地址如下:
https://grafana.com/grafana/dashboards/15757-kubernetes-views-global/
安装kube-state-metrics
为了想Prometheus暴露一些有用的性能指标,需要在k8s集群中,安装kube-state-metrics。
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install kube-state-metrics prometheus-community/kube-state-metrics -n kube-system
测试验证:
kubectl port-forward svc/kube-state-metrics -n kube-system 8080:8080
使用PromQL测试:
count(kube_pod_status_ready{condition="false"}) by (namespace, pod)
prometheus配置
scrape_configs:
- job_name: kube-state-metricshonor_timestamps: truescrape_interval: 1mscrape_timeout: 1mmetrics_path: /metricsscheme: httpstatic_configs:- targets:- kube-state-metrics.kube-system.svc.cluster.local:8080
安装 prometheus-node-exporter
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus-node-exporter prometheus-community/prometheus-node-exporter -n kube-system
测试:
export POD_NAME=$(kubectl get pods --namespace kube-system -l "app.kubernetes.io/name=prometheus-node-exporter,app.kubernetes.io/instance=prometheus-node-exporter" -o jsonpath="{.items[0].metadata.name}")
kubectl port-forward --namespace kube-system $POD_NAME 9100
prometheus配置
scrape_configs:
- job_name: 'node-exporter'kubernetes_sd_configs:- role: noderelabel_configs:- action: replacesource_labels: [__address__]regex: '(.*):10250'replacement: '${1}:9100'target_label: __address__
整体prometheus配置
global:scrape_interval: 30s# external_labels:# clusterArn: <REPLACE_ME>
scrape_configs:# pod metrics- job_name: pod_exporterkubernetes_sd_configs:- role: pod# container metrics- job_name: cadvisorscheme: httpsauthorization:credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenkubernetes_sd_configs:- role: noderelabel_configs:- action: labelmapregex: __meta_kubernetes_node_label_(.+)- replacement: kubernetes.default.svc:443target_label: __address__- source_labels: [__meta_kubernetes_node_name]regex: (.+)target_label: __metrics_path__replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor# apiserver metrics- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenjob_name: kubernetes-apiserverskubernetes_sd_configs:- role: endpointsrelabel_configs:- action: keepregex: default;kubernetes;httpssource_labels:- __meta_kubernetes_namespace- __meta_kubernetes_service_name- __meta_kubernetes_endpoint_port_namescheme: https# kube proxy metrics- job_name: kube-proxyhonor_labels: truekubernetes_sd_configs:- role: podrelabel_configs:- action: keepsource_labels:- __meta_kubernetes_namespace- __meta_kubernetes_pod_nameseparator: '/'regex: 'kube-system/kube-proxy.+'- source_labels:- __address__action: replacetarget_label: __address__regex: (.+?)(\\:\\d+)?replacement: $1:10249# kube-state-metrics- job_name: kube-state-metricshonor_timestamps: truescrape_interval: 1mscrape_timeout: 1mmetrics_path: /metricsscheme: httpstatic_configs:- targets:- kube-state-metrics.kube-system.svc.cluster.local:8080# node-exporter- job_name: 'node-exporter'kubernetes_sd_configs:- role: noderelabel_configs:- action: replacesource_labels: [__address__]regex: '(.*):10250'replacement: '${1}:9100'target_label: __address__
这里需要重新创建一个抓取程序。
效果
参考
- grafana-dashboards-kubernetes
- kube-state-metrics
- Monitoring Kubernetes Clusters with kube-state-metrics
- kube-state-metrics公共指标
- Kubernetes 对象状态的指标
- helm-charts/charts/kube-state-metrics
- Prometheus 结合 Node Exporter 监控 Kubernetes 集群节点