Prometheus+Grafana（详细讲解）-编程知识

Prometheus(普罗米修斯）监控系统

1、Prometheus概述

1.1 任务背景

某公司由于业务快速发展，公司要求对现有机器进行业务监控，责成运维部门来实施这个任务。任务要求如下：

部署监控服务器，实现7x24实时监控
针对公司的业务及研发部门设计监控系统，对监控项和触发器拿出合理意见
做好问题预警机制，对可能出现的问题要及时告警并形成严格的处理机制
做好监控告警系统，要求可以实现告警分级
- 一级报警电话通知
- 二级报警微信通知
- 三级报警邮件通知
处理好公司服务器异地集中监控问题

为什么要监控？

实时收集数据，通过报警及时发现问题，及时处理。数据为优化也可以提供依据。

监控四要素：

监控对象 [主机状态服务资源页面，url]
用什么监控
什么时间监控 [7x24 5x8]
报警给谁

监控技术选型：

mrtg (MRTG - Tobi Oetiker’s MRTG - The Multi Router Traffic Grapher)通过snmp协议得到设备的流量信息，并以包含PNG格式的图形的HTML文档方式显示给用户。
cacti (仙人掌) 用php语言实现的一个软件，它的主要功能是用snmp服务获取数据，然后用rrdtool储存和更新数据。官网地址:Cactus | Description, Distribution, Family, & Facts | Britannica
ntop 官网地址: https://www.ntop.org/ 。
nagios 能够跨平台,插件多,报警功能强大。官网地址: https://www.nagios.org/
centreon 底层使用的就是nagios。是一个nagios整合版软件。官网地址:https://www.centreon.com/
ganglia 设计用于测量数以千计的节点,资源消耗非常小。官网地址:http://ganglia.info/
open-falcon 小米发布的运维监控软件，高效率，高可用。时间较短，用户基数小。官网地址: http://open-falcon.org/
zabbix 跨平台，画图，多条件告警，多种API接口。使用基数特别大。官网地址: https://www.zabbix.com/
prometheus 基于时间序列的数值数据的容器监控解决方案。官网地址: https://prometheus.io/

综合分析：Prometheus比较适合公司的监控需求

1.2 Prometheus特点

Prometheus 受启发于 Google 的 Brogmon 监控系统（相似的 Kubernetes 是从 Google的 Brog 系统演变而来），从 2012 年开始由前 Google 工程师在 Soundcloud 以开源软件的形式进行研发，并且于 2015 年早期对外发布早期版本。2016 年 5 月继 Kubernetes 之后成为第二个正式加入 CNCF 基金会的项目，同年 6 月正式发布 1.0 版本。2017 年底发布了基于全新存储层的 2.0 版本，能更好地与容器平台、云平台配合。

Prometheus 作为新一代的云原生监控系统，目前已经有超过 650+位贡献者参与到Prometheus 的研发工作上，并且超过 120+项的第三方集成。

Prometheus 是一个开源的完整监控解决方案，其对传统监控系统的测试和告警模型进行了彻底的颠覆，形成了基于中央化的规则计算、统一分析和告警的新模型。相比于传统监控系统，Prometheus 具有以下优点：

1 易于管理

Prometheus优秀的设计使得其本身非常易于管理，不会因为Prometheus增加管理成本。

Prometheus 核心部分只有一个单独的二进制文件，不存在任何的第三方依赖(数据库，缓存等等)。唯一需要的就是本地磁盘，因此不会有潜在级联故障的风险。
Prometheus 基于 Pull 模型的架构方式，可以在任何地方（本地电脑，开发环境，测试环境）搭建我们的监控系统。也可以通过中间网关支持push模型
对于一些复杂的情况，还可以使用 Prometheus 服务发现(Service Discovery)的能力动态管理监控目标。

2 可监控服务的内部运行状态

Pometheus 鼓励用户监控服务的内部状态，基于 Prometheus 丰富的 Client 库，用户可以轻松的在应用程序中添加对 Prometheus 的支持，从而让用户可以获取服务和应用内部真正的运行状态。

在这里插入图片描述

3 强大的数据模型

所有采集的监控数据均以指标(metric)的形式保存在内置的时间序列数据库当中(TSDB)。所有的样本除了基本的指标名称以外，还包含一组用于描述该样本特征的标签。

如下所示：

http_request_status{code=‘200’,content_path=‘/api/path’,environment=‘produment’} =>[value1@timestamp1,value2@timestamp2…]
http_request_status{code=‘200’,content_path=‘/api/path2’,environment=‘produment’} =>[value1@timestamp1,value2@timestamp2…]

每一条时间序列由指标名称(Metrics Name)以及一组标签(Labels)唯一标识。每条时间序列按照时间的先后顺序存储一系列的样本值。

http_request_status：指标名称(Metrics Name)
{code=‘200’,content_path=‘/api/path’,environment=‘produment’}：表示维度的标签，基于这些 Labels 我们可以方便地对监控数据进行聚合，过滤，裁剪。
[value1@timestamp1,value2@timestamp2…]：按照时间的先后顺序存储的样本值。

4 强大的查询语言 PromQL

Prometheus 内置了一个强大的数据查询语言 PromQL。通过 PromQL 可以实现对监控数据的查询、聚合。同时 PromQL 也被应用于数据可视化(如 Grafana)以及告警当中。

通过 PromQL 可以轻松回答类似于以下问题：

在过去一段时间中 95%应用延迟时间的分布范围？
预测在 4 小时后，磁盘空间占用大致会是什么情况？
CPU 占用率前 5 位的服务有哪些？(过滤)

5 高效

对于监控系统而言，大量的监控任务必然导致有大量的数据产生。而 Prometheus 可以高效地处理这些数据，对于单一 Prometheus Server 实例而言它可以处理：

数以百万的监控指标
每秒处理数十万的数据点

6 可扩展

可以在每个数据中心、每个团队运行独立的 Prometheus Sevrer。Prometheus 对于联邦集群的支持，可以让多个 Prometheus 实例产生一个逻辑集群，当单实例 PrometheusServer 处理的任务量过大时，通过使用功能分区(sharding)+联邦集群(federation)可以对其进行扩展。

7 易于集成

使用 Prometheus 可以快速搭建监控服务，并且可以非常方便地在应用程序中进行集成。目前支持：Java，JMX，Python，Go，Ruby，.Net，Node.js 等等语言的客户端 SDK，基于这些 SDK 可以快速让应用程序纳入到 Prometheus 的监控当中，或者开发自己的监控数据收集程序。

同时这些客户端收集的监控数据，不仅仅支持 Prometheus，还能支持 Graphite 这些其他的监控工具。同时 Prometheus 还支持与其他的监控系统进行集成：Graphite， Statsd， Collected，Scollector， muini， Nagios 等。 Prometheus 社区还提供了大量第三方实现的监控数据采集支持：JMX，CloudWatch，EC2，MySQL，PostgresSQL，Haskell，Bash，SNMP，Consul，Haproxy，Mesos，Bind，CouchDB，Django，Memcached，RabbitMQ，Redis，RethinkDB，Rsyslog 等等。

8 可视化

Prometheus提供了强大的可视化能力，不能自身提供了独立的可视化解决方案，且可以和很多流行的可视化工具进行整合。

Prometheus Server 中自带的 Prometheus UI，可以方便地直接对数据进行查询，并且支持直接以图形化的形式展示数据。同时 Prometheus 还提供了一个独立的基于Ruby On Rails 的 Dashboard 解决方案 Promdash。
最新的 Grafana 可视化工具也已经提供了完整的 Prometheus 支持，基于 Grafana 可以创建更加精美的监控图标。
基于 Prometheus 提供的 API 还可以实现自己的监控可视化 UI。

9 开放性

通常来说当我们需要监控一个应用程序时，一般需要该应用程序提供对相应监控系统协议的支持，因此应用程序会与所选择的监控系统进行绑定。为了减少这种绑定所带来的限制，对于决策者而言要么你就直接在应用中集成该监控系统的支持，要么就在外部创建单独的服务来适配不同的监控系统。

而对于 Prometheus 来说，使用 Prometheus 的 client library 的输出格式不止支持Prometheus 的格式化数据，也可以输出支持其它监控系统的格式化数据，比如 Graphite。因此你甚至可以在不使用 Prometheus 的情况下，采用 Prometheus 的 client library 来让你的应用程序支持监控数据采集。

2、Prometheus的使用

2.1 Prometheus架构和生态圈组件

在这里插入图片描述

架构解析：

存储计算层
- Prometheus Server，里面包含了存储引擎和计算引擎。
- Retrieval 组件为取数组件，它会主动从 Pushgateway 或者 Exporter 拉取指标数据。
- Service discovery，可以动态发现要监控的目标。
- TSDB（Time Series Database时间序列数据库），数据核心存储与查询。
- HTTP server，对外提供 HTTP 服务。
采集层
采集层分为两类，一类是生命周期较短的作业，还有一类是生命周期较长的作业。
- 短作业：直接通过 API，在退出时间指标推送给 Pushgateway。
- 长作业：Retrieval 组件直接从 Job 或者 Exporter 拉取数据。Prometheus提供了各种常用的exporter，方便我们使用Prometheus对服务进行监控。
应用层
应用层主要分为两种，一种是 AlertManager，另一种是数据可视化。
- AlertManager
  
  对接 Pagerduty，是一套付费的监控报警系统。可实现短信报警、5 分钟无人 ack 打电话通知、仍然无人 ack，通知值班人员 Manager…
  Email，发送邮件… …
- 数据可视化
Prometheus build-in WebUI
Grafana
其他基于 API 开发的客户端

2.2 Prometheus实验环境规划

主机	运行服务	监控范围
prometheus10	prometheus server
mysql11	mysql + node_export+ mysql_export	数据库+主机
application12	java应用（springboot应用）+node_export	java应用+主机

克隆机器，修改为静态ip（要求能上外网）

vi /etc/sysconfig/network-scripts/ifcfg-ens33#根据自己的VMWare虚拟机网段配置，修改如下几个参数
IPADDR="192.168.11.10"  # 根据自己的网段，将11修改为自己的网段号
PREFIX="24" #不用改
GATEWAY="192.168.11.2" # 根据自己的网段，将11修改为自己的网段号
DNS1="8.8.8.8"			# 不用修改
DNS1="114.114.114.114"  # 不用修改

修改主机名

# hostnamectl set-hostname 新的主机名
# 示例如下：
hostnamectl set-hostname prometheus10

关闭防火墙,selinux

 # 停止防火请，并禁止开启自启动
systemctl stop firewalld 
systemctl disable firewalld
# 关闭selinux，修改后需重启虚拟机
vi /etc/selinux/config
#修改SELINUX=enforcing
SELINUX=disabled

配置ip和主机名映射

vi /etc/hosts
#增加如下内容
192.168.11.10 prometheus10
192.168.11.11 mysql11

说明：

每台机器都要修改，注意每个机器的IPADDR最后一段一定要不同。
为了更好的操作体验，最好在windows机器也配置ip的主机名映射
- C盘/windows/system32/drivers/hosts

2.3 安装Prometheus Server

Prometheus 基于 Golang 编写，编译后的软件包，不依赖于任何的第三方依赖。只需要下载对应平台的二进制包，解压并且添加基本的配置即可正常启动 Prometheus Server。

上传安装包

上传 prometheus-2.29.1.linux-amd64.tar.gz 到虚拟机的/opt/software 目录
```
[root@prometheus10 opt]# ls /opt/software/
prometheus-2.29.1.linux-amd64.tar.gz
```

解压到/opt/module 目录下

#新建module目录
[root@prometheus10 opt]# mkdir /opt/module 
#解压缩
[root@prometheus10 opt]# tar xzvf /opt/software/prometheus-2.29.1.linux-amd64.tar.gz -C /opt/module/
#prometheus文件夹改名
[root@prometheus10 opt]# mv /opt/module/prometheus-2.29.1.linux-amd64/ /opt/module/prometheus-2.29.1

阅读配置文件

prometheus的配置内容在 prometheus.yml中，默认配置如下：

 # my global config 全局配置块： 控制 Prometheus 服务器的全局配置
global:scrape_interval: 15s # 配置拉取数据的时间间隔（这里设置为15s），如果不设置默认为 1 分钟。evaluation_interval: 15s # 规则验证（生成 alert）的时间间隔（这里设置为15s），如果不设置默认为 1 分钟。.# scrape_timeout is set to the global default (10s).# Alertmanager configuration  告警配置alerting:alertmanagers:- static_configs:- targets:# - alertmanager:9093 # 规则配置文件# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.rule_files:# - "first_rules.yml"# - "second_rules.yml"# 配置采集目标相关， prometheus 监视的目标scrape_configs:# Prometheus自身的运行信息可以通过 HTTP 访问，所以 Prometheus 可以监控自己的运行数据# job_name：监控作业的名称- job_name: "prometheus"# metrics_path defaults to '/metrics'# scheme defaults to 'http'.# static_configs: 表示静态目标配置，就是固定从某个 target 拉取数据static_configs:- targets: ["localhost:9090"]

Prometheus 是可以在运行时自动加载配置的。启动时需要添加：--web.enable-lifecycle

启动prometheus server

# 先进入到Prometheus安装目录
[root@prometheus10 ~]# cd /opt/module/prometheus-2.29.1/
# 启动prometheus
[root@prometheus10 prometheus-2.29.1]# ./prometheus

访问测试Prometheus

http://192.168.11.10:9090 , Prometheus默认占用9090端口

2.4 监控Linux主机

在 Prometheus 的架构设计中，Prometheus Server 主要负责数据的收集，存储并且对外提供数据查询支持，而实际的监控样本数据的收集则是由 Exporter 完成。因此为了能够监控到某些东西，如主机的 CPU 使用率，我们需要使用到 Exporter。Prometheus 周期性的从 Exporter 暴露的 HTTP 服务地址（通常是/metrics）拉取监控样本数据。

Exporter 可以是一个相对开放的概念，其可以是一个独立运行的程序独立于监控目标以外，也可以是直接内置在监控目标中。只要能够向 Prometheus 提供标准格式的监控样本数据即可。

为了能够采集到主机的运行指标如 CPU, 内存，磁盘等信息。我们可以使用 Node Exporter。Node Exporter 同样采用 Golang 编写，并且不存在任何的第三方依赖，只需要下载，解压即可运行。可以从 https://prometheus.io/download/ 获取最新的 node_exporter 版本的二进制包。

上传安装包

上传 node_exporter-1.2.2.linux-amd64.tar.gz 到虚拟机的/opt/software 目录
```
[root@mysql11 ~]# ls /opt/software
node_exporter-1.2.2.linux-amd64.tar.gz
```

解压安装包到/opt/module 目录下

#新建module目录
[root@mysql11 ~]# mkdir /opt/module 
#解压缩
[root@mysql11 ~]# tar xzvf /opt/software/node_exporter-1.2.2.linux-amd64.tar.gz -C /opt/module/
#node_exporter文件夹改名
[root@mysql11 ~]# mv /opt/module/node_exporter-1.2.2.linux-amd64/ /opt/module/node_exporter-1.2.2

启动export，并通过metrics端点查看当前node export获取的监控信息
```
# 执行./node_exporter
#先进入node_exporter的目录
[root@mysql11 ~]# cd /opt/module/node_exporter-1.2.2/
# 再执行node_exporter
[root@mysql11 node_exporter-1.2.2]# ./node_exporter
```
浏览器输入：http://mysql11:9100/metrics，(如果没有在windows机器配置ip映射，需要将mysql11改为具体的ip)，可以看到当前 node exporter 获取到的当前主机的所有监控数据。

回到Prometheus服务器，修改Prometheus配置文件，增加对Linux主机的监控job

# 在 scrape_configs 配置项下添加配置：scrape_configs:- job_name: "prometheus"static_configs:- targets: ["localhost:9090"]# 添加Node Exporter监控配置- job_name: "node_exporter"static_configs:- targets: ["mysql11:9100"]

如果开启了热加载，此时可以访问热加载接口以完成配置文件的加载。`curl -X POST http://localhost:9090/-/reload`

重启Prometheus，通过页面查看是否成功

http://192.168.11.10:9090 , Prometheus默认占用9090端口

在这里插入图片描述

2.5 监控MySQL

为了能够采集到MySQL的运行指标，我们可以使用 MySQL Exporter。MySQL Exporter 是社区专门为采集 MySQL/MariaDB 数据库监控指标而设计开发，通过 Exporter 上报核心的数据库指标，用于异常报警和监控大盘展示。

数据库授权

因为 MySQL Exporter 是通过查询数据库中状态数据来对其进行监控，所以需要为对应的数据库实例进行授权。我们新建一个账户名为exporter，密码为 123456的账户，并为其授予相应的权限。
```
CREATE USER 'exporter'@'localhost' IDENTIFIED BY '123456' WITH MAX_USER_CONNECTIONS 3;
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';
flush privileges;
```
注意:授权ip为localhost，是因为exporter账密由mysql_exporter使用用来检索数据库运行指标，而mysql expoerter和mysql在同一台机器上。所以这个localhost是指的mysql_exporter的IP。
上传安装包

上传 node_exporter-1.2.2.linux-amd64.tar.gz 到虚拟机的/opt/software 目录
```
[root@mysql11 ~]# ls /opt/software
mysqld_exporter-0.13.0.linux-amd64.tar.gz
```

解压安装包到/opt/module 目录下

#新建module目录
[root@mysql11 ~]# mkdir /opt/module 
#解压缩
[root@mysql11 ~]# [root@mysql11 ~]# tar xzvf /opt/software/mysqld_exporter-0.13.0.linux-amd64.tar.gz -C /opt/module/
#mysql_exporter文件夹改名
[root@mysql11 ~]#  mv /opt/module/mysqld_exporter-0.13.0.linux-amd64/ /opt/module/mysqld_exporter-0.13.0

在mysqld_exporter文件夹中，新建一个my.cnf配置

执行 vi /opt/module/node_exporter-1.2.2/my.cnf ，新建my.cnf文件，内容配置如下:
```
[client]
user=exporter
password=123456
```
启动mysql_exporter，并访问9104端口
```
#先进入mysql_exporter的目录
[root@mysql11 ~]# cd /opt/module/mysqld_exporter-0.13.0/
[root@mysql11 mysqld_exporter-0.13.0]#
# 再执行mysql_exporter
[root@mysql11 mysqld_exporter-0.13.0]# ./mysqld_exporter --config.my-cnf=my.cnf
```
浏览器输入：http://mysql11:9104/metrics，(如果没有在windows机器配置ip映射，需要将mysql11改为具体的ip)，可以看到当前 mysql exporter 获取到mysql的所有监控数据。

回到Prometheus服务器，修改Prometheus配置文件，增加对MySQL的监控job

# 在 scrape_configs 配置项下添加配置：scrape_configs:- job_name: "prometheus"static_configs:- targets: ["localhost:9090"]- job_name: "node_exporter"static_configs:- targets: ["mysql11:9100"]- job_name: "mysql_exporter"static_configs:- targets: ["mysql11:9104"]

重启Prometheus，通过页面查看是否成功

http://192.168.11.10:9090 , Prometheus默认占用9090端口

2.6 监控java应用

在使用 Spring Boot 作为开发框架时，需要监控应用的状态，例如 JVM/Spring MVC 等。而为了使监控深入到应用的内部，就需要应用自身暴露作为Exporter暴露监控指标，这就和应用的开发语言和技术框架紧密相关了。Prometheus 监控SpringBoot服务基于 Spring Actuator 机制采集 JVM 等数据。

修改应用的依赖和配置

<!--
项目中已经引用 spring-boot-starter-web 的基础上，在 pom.xml 文件中添加 actuator/prometheus Maven 依赖项。
-->
<dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency><groupId>io.micrometer</groupId><artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

修改springboot项目配置文件

management:server:port: 8091endpoint:prometheus:enabled: trueendpoints:web:exposure:include: health,info,prometheusmetrics:tags:application: spring-boot-mvc-demo

打包，并运行jar包，并访问配置的8091端口

 java -jar springboot-prometheus.jar

此时访问 http://ip:8091/actuator/prometheus ,以看到当前java应用的所有监控数据。

在这里插入图片描述

回到Prometheus服务器，修改Prometheus配置文件，增加对Java应用的监控job

# 在 scrape_configs 配置项下添加配置：scrape_configs:- job_name: "prometheus"static_configs:- targets: ["localhost:9090"]- job_name: "node_exporter"static_configs:- targets: ["mysql11:9100"]- job_name: "mysql_exporter"static_configs:- targets: ["mysql11:9104"]# 添加监控java应用的job- job_name: "springboot_exporter"metrics_path: '/actuator/prometheus'static_configs:- targets: ["mysql11:8091"]

重启Prometheus，通过页面查看是否成功

http://192.168.11.10:9090 , Prometheus默认占用9090端口

在这里插入图片描述

3、Grafana的简单使用

3.1 Grafana的安装

grafana 是一款采用 Go 语言编写的开源应用，主要用于大规模指标数据的可视化展现，是网络架构和应用分析中最流行的时序数据展示工具，目前已经支持绝大部分常用的时序数据库。下载地址：https://grafana.com/grafana/download

上传安装包

上传 grafana-enterprise-8.1.2.linux-amd64.tar.gz 到虚拟机的/opt/software 目录
```
[root@prometheus10 ~]# ls /opt/software/
grafana-enterprise-8.1.2.linux-amd64.tar.gz
```

解压安装包到/opt/module 目录下

#新建module目录
[root@prometheus10 ~]# mkdir /opt/module 
#解压缩
[root@prometheus10 ~]# tar xzvf /opt/software/grafana-enterprise-8.1.2.linux-amd64.tar.gz -C /opt/module

启动grafana

# 先进入到grafana安装目录
[root@prometheus10 ~]# cd /opt/module/grafana-8.1.2/
# 启动grafana
[root@prometheus10 grafana-8.1.2]# ./bin/grafana-server web > ./grafana.log 2>&1 &

访问web管理界面

打开地址: http://ip:3000 ,默认用户名和密码都是admin

3.2 Grafana集成Prometheus

Grafana配置Prometheus连接信息

点击配置，选择DataSource
点击Add datasource
配置Prometheus Server的地址
点击下方的Save & Test，出现绿色的Data source is working 即说明Prometheus正常联通
点击Back返回，即可看到新添加的Prometheus

手动创建仪表盘Dashboard

手动新建仪表盘

在这里插入图片描述

在面板中配置监控项

在这里插入图片描述

导入仪表盘

手动一个个添加 Dashboard 比较繁琐，Grafana 社区鼓励用户分享 Dashboard，通过https://grafana.com/dashboards网站，可以找到大量可直接使用的Dashboard模板。
Grafana 中所有的 Dashboard 通过 JSON 进行共享，下载并且导入这些 JSON 文件，就可以直接使用这些已经定义好的 Dashboard：
在这里插入图片描述