elasticsearch系列七:聚合查询

概述

今天咱们来看下es中的聚合查询,在es中聚合查询分为三大类bucket、metrics、pipeline,每一大类下又有十几种小类,咱们各举例集中,有兴许的同学可以参考官网:https://www.elastic.co/guide/en/elasticsearch/reference/7.10/search-aggregations.html 本次基于es7.10.2版本编写。

metics聚合

常用指标类的聚合无外乎这几种:Avg、Min、Max、Sum、Cardinality、Percentile ranks。咱们来看下具体语法:

Avg、Min、Max、Sum这几个雷同只需要换函数名即可,假如我们有一个日志索引,其索引mapping如下:

{    "mappings": {        "properties": {            "routePath": {                "type":"keyword"            },            "serverCode": {                "type":"keyword"            },            "taskTime": {                "type":"long"            },            "reuqestMsg": {                "type":"text"            },            "responseMsg": {                "type":"text"            }        }    }}

我们想看下近一月的接口某接口平均耗时、最小耗时、最大耗时等指标,此时dsl可以如下编写:

GET /log-2023-02/_serach{    "size": 0,    "query": {        "bool": {            "filter": [                {                    "term": {                        "routePath": "/user/getUserInfo"                    }                }            ]        }    },    "aggs": {        "avg": {            "avg": {                "field": "taskTime"            }        }    }}

返回结果:

图片

        咱们看下如何去重,根据接口地址去重查询:

{    "size": 0,    "aggs": {        "cardinality": {            "cardinality": {                "field": "routePath"            }        }    }}

图片

只是这个cardinality有误差,它底层采用的是HyperLogLog的算法,通过计算数据的hash值来去重所以有误差,百万数据误差在5%以内,我们可以通过precision_threshold参数去调整最大支持4万,该值越大耗费内存也就越大如果数据总量在4万以内那么调整到最大值可以保证100%正确。

接下来咱们看Percentile ranks这个也是比较常用的聚合分析函数他的结果也是有误差的但是不影响我们分析整体情况,比如我们需要计算整体系统的性能可以这样搞:查询接口再响应这些耗时上的百分比就可以通过如下语句​​​​​​​

{    "size": 0,    "aggs": {        "rate": {            "percentile_ranks": {                "field": "taskTime",                "values": [                    20,                    40,                    50,                    60                ]            }        }    }}

结果:

图片

bucket聚合

桶聚合中我们常用的有分组、直方图、范围、根据日期分桶聚合这几类,咱们先看下分组查询(terms)举例我们想统计下各个接口调用量情况:​​​​​​​

{    "size": 0,    "aggs": {        "term": {            "terms": {                "field": "routePath"            }        }    }

返回结果:​​​​​​​

"aggregations": {        "term": {            "doc_count_error_upper_bound": 0,            "sum_other_doc_count": 0,            "buckets": [                {                    "key": "/user/getUserInfo",                    "doc_count": 5                },                {                    "key": "/user/addUser",                    "doc_count": 1                },                {                    "key": "/user/updateMobile",                    "doc_count": 1                },                {                    "key": "/user/updateUser",                    "doc_count": 1                }            ]        }    }

咱们再看直方图的查询统计接口耗时、间隔为1:​​​​​​​

{    "size": 0,    "aggs": {        "histogram": {            "histogram": {                "field": "taskTime",                "interval": 1            }        }    }}

结果

"aggregations": {        "histogram": {            "buckets": [                {                    "key": 20.0,                    "doc_count": 2                },                {                    "key": 21.0,                    "doc_count": 0                },                {                    "key": 22.0,                    "doc_count": 0                }           ]        }    }

根据日期统计各接口调用情况,用直方图实行展现:​​​​​​​

{    "size": 0,    "aggs": {        "date_histogram": {            "date_histogram": {                "field": "requestTime",                "interval": "day"            }        }    }}

查询结果:

"aggregations": {        "histogram": {            "buckets": [                {                    "key_as_string": "2023-02-01T00:00:00.000Z",                    "key": 1675209600000,                    "doc_count": 1                },                {                    "key_as_string": "2023-02-02T00:00:00.000Z",                    "key": 1675296000000,                    "doc_count": 1                },                {                    "key_as_string": "2023-02-03T00:00:00.000Z",                    "key": 1675382400000,                    "doc_count": 1                }            ]        }    }

pipeline聚合

它其实是对bucket聚合的结果再次进行聚合分期,数据准备:


{ "create" : {  "_index" : "employees" } }
{ "name" : "Emma","age":32,"job":"Product Manager","gender":"female","salary":35000 }
{ "create" : {  "_index" : "employees" } }
{ "name" : "Underwood","age":41,"job":"Dev Manager","gender":"male","salary": 50000}
{ "create" : {  "_index" : "employees" } }
{ "name" : "Tran","age":25,"job":"Web Designer","gender":"male","salary":18000 }
{ "create" : {  "_index" : "employees" } }
{ "name" : "Rivera","age":26,"job":"Web Designer","gender":"female","salary": 22000}
{ "create" : {  "_index" : "employees" } }
{ "name" : "Rose","age":25,"job":"QA","gender":"female","salary":18000 }
{ "create" : {  "_index" : "employees" } }
{ "name" : "Lucy","age":31,"job":"QA","gender":"female","salary": 25000}
{ "create" : {  "_index" : "employees" } }
{ "name" : "Byrd","age":27,"job":"QA","gender":"male","salary":20000 }
{ "create" : {  "_index" : "employees" } }
{ "name" : "Foster","age":27,"job":"Java Programmer","gender":"male","salary": 20000}
{ "create" : {  "_index" : "employees" } }
{ "name" : "Gregory","age":32,"job":"Java Programmer","gender":"male","salary":22000 }
{ "create" : {  "_index" : "employees" } }
{ "name" : "Bryant","age":20,"job":"Java Programmer","gender":"male","salary": 9000}
{ "create" : {  "_index" : "employees" } }
{ "name" : "Jenny","age":36,"job":"Java Programmer","gender":"female","salary":38000 }
{ "create" : {  "_index" : "employees" } }
{ "name" : "Mcdonald","age":31,"job":"Java Programmer","gender":"male","salary": 32000}
{ "create" : {  "_index" : "employees" } }
{ "name" : "Jonthna","age":30,"job":"Java Programmer","gender":"female","salary":30000 }
{ "create" : {  "_index" : "employees" } }
{ "name" : "Marshall","age":32,"job":"Javascript Programmer","gender":"male","salary": 25000}
{ "create" : {  "_index" : "employees" } }
{ "name" : "King","age":33,"job":"Java Programmer","gender":"male","salary":28000 }
{ "create" : {  "_index" : "employees" } }
{ "name" : "Mccarthy","age":21,"job":"Javascript Programmer","gender":"male","salary": 16000}
{ "create" : {  "_index" : "employees" } }
{ "name" : "Goodwin","age":25,"job":"Javascript Programmer","gender":"male","salary": 16000}
{ "create" : {  "_index" : "employees" } }
{ "name" : "Catherine","age":29,"job":"Javascript Programmer","gender":"female","salary": 20000}
{ "create" : {  "_index" : "employees" } }
{ "name" : "Boone","age":30,"job":"DBA","gender":"male","salary": 30000}
{ "create" : {  "_index" : "employees" } }
{ "name" : "Kathy","age":29,"job":"DBA","gender":"female","salary": 20000}

我们根据以上数据想要查询平均薪资最低的行业:​​​​​​​

{  "size": 0,  "aggs": {    "jobs": {      "terms": {        "field": "job.keyword",        "size": 10      },      "aggs": {        "avg_salary": {          "avg": {            "field": "salary"          }        }      }    },    "min_salary_by_job":{      "min_bucket": {  #再次进行聚合查询 将jobs桶下的avg_salary求出最小值        "buckets_path": "jobs>avg_salary"      }    }  }}

结果如下:​​​​​​​

"aggregations": {        "jobs": {            "doc_count_error_upper_bound": 0,            "sum_other_doc_count": 0,            "buckets": [                {                    "key": "Java Programmer",                    "doc_count": 7,                    "avg_salary": {                        "value": 25571.428571428572                    }                },                {                    "key": "Javascript Programmer",                    "doc_count": 4,                    "avg_salary": {                        "value": 19250.0                    }                },                {                    "key": "DBA",                    "doc_count": 2,                    "avg_salary": {                        "value": 25000.0                    }                },                {                    "key": "Product Manager",                    "doc_count": 1,                    "avg_salary": {                        "value": 35000.0                    }                }            ]        },        "min_salary_by_job": {            "value": 19250.0,            "keys": [                "Javascript Programmer"            ]        }    }

还有将bucket结果再次进行平均 avg_bucket,bucket结果再次求最大的max_bucket,bucket结果再次求百分比的 percentiles_bucket等等。

总结

基本上咱们把常用的一些聚合查询都给大家演示了一遍,当然es本身支持的聚合查询远远不止这些,有兴趣的同学可以参考es官网的学习手册:https://www.elastic.co/guide/en/elasticsearch/reference/7.10/index.html 来探索更多的语法糖。


Elasticsearch系列经典文章

  • elasticsearch列一:索引模板的使用

  • elasticsearch系列二:引入索引模板后发现数据达到一定量还是慢怎么办?

  • elasticsearch系列三:常用查询语法

  • elasticsearch系列四:集群常规运维

  • elasticsearch系列五:集群的备份与恢复

  • elasticsearch系列六:索引重建

图片

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/305580.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

产品管理-学习笔记-版本的划分

版本号说明【X.Y.Z_修饰词】 版本号定义原则X表示大版本号,一般当产品出现重大更新、调整、不再向后兼容的情况时我们会在X上加1Y表示功能更新,在产品原有的基础上增加、修改部分功能,且并不影响产品的整体流程或业务Z表示小修改&#xff0c…

Apache OFBiz groovy 远程代码执行漏洞(CVE-2023-51467)

项目介绍 Apache OFBiz是一个非常著名的电子商务平台,是一个非常著名的开源项目,提供了创建基于最新J2EE/XML规范和技术标准,构建大中型企业级、跨平台、跨数据库、跨应用服务器的多层、分布式电子商务类WEB应用系统的框架。 OFBiz最主要的特…

(04730)半导体器件之基本放大器工作原理(三)

本文主要阐述多级与差动放大器 为使输入的微弱信号进行放大后能获得足够的输出功率去推动负载运行,往往要采用所谓的多级放大电路,信号逐级通过放大,直至得到输出信号。这就必须考虑放大电路级与级之间的信号传递方法,或者称为耦…

数据库进阶教学——主从复制(Ubuntu22.04主+Win10从)

目录 一、概述 二、原理 三、搭建 1、备份数据 2、主库配置Ubuntu22.04 2.1、设置阿里云服务器安全组 2.2、修改配置文件 /etc/my.cnf 2.3、重启MySQL服务 2.4、登录mysql,创建远程连接的账号,并授予主从复制权限 2.5、通过指令,查…

Vue - Class和Style绑定详解

1. 模板部分 <template><div><!-- Class 绑定示例 --><div :class"{ active: isActive, text-danger: hasError }">Hello, Vue!</div><!-- Class 绑定数组示例 --><div :class"[activeClass, errorClass]">Cla…

MySQL查找配置文件的3种方法(Windows)

方法一&#xff1a; 查找根目录 SELECT basedir; 方法二&#xff1a; 查找数据目录 SELECT datadir; 方法三&#xff1a; 文件属性查询“defaults-file”

51单片机(STC8)-- GPIO输入输出

文章目录 I/O口相关寄存器端口数据寄存器端口模式配置寄存器&#xff08;PxM0&#xff0c;PxM1&#xff09;端口上拉电阻控制寄存器(PxPU)关于I/O的注意事项 配置I/O口I/O设置demoI/O端口模式LED控制&#xff08;I/O输出&#xff09;按键检测&#xff08;I/O输入&#xff09; S…

Qt(二):使用udp发送与接收图片

使用Qt来通过UDP协议发送和接收图片可以分为几个步骤。以下是一个基本的指南&#xff1a; 发送图片准备图片数据&#xff1a;首先&#xff0c;你需要将图片转换为可以在网络上传输的数据格式。通常&#xff0c;这涉及到将图片转换为字节数组。设置UDP套接字&#xff1a;在Qt中…

拓扑排序图解-Kahn算法和深度优先搜索

拓扑排序 是将一个有向无环图中的每个节点按照依赖关系进行排序。比如图 G G G存在边 < u , v > <u,v> <u,v> 代表 v v v的依赖 u u u, 那么在拓扑排序中&#xff0c;节点 u u u一定在 v v v的前面。 从另一个角度看&#xff0c;拓扑排序是一种图遍历&#…

【计算机毕业设计】SSM医疗药品采购系统

项目介绍 ssm医疗药品采购系统。主要功能有&#xff1a; 用户管理&#xff1a;管理员列表&#xff1b; 采购管理&#xff1a;采购列表&#xff1b; 药品出库&#xff1a;药品出库&#xff1b; 库存管理&#xff1a;库存统计&#xff1b; 数据维护&#xff1a;药品列表、仓库…

Java多线程常见的成员方法(线程优先级,守护线程,礼让/插入线程)

目录 1.多线程常见的成员方法2.优先级相关的方法3.守护线程&#xff08;备胎线程&#xff09;4.其他线程 1.多线程常见的成员方法 ①如果没有给线程设置名字&#xff0c;线程是有默认名字 的&#xff1a;Thread-X(X序号&#xff0c;从0开始) ②如果要给线程设置名字&#xff0c…

骑砍战团MOD开发(25)-module_animations.py骨骼动画

一.引擎固化骨架 Data\skeleton_bodies.xml:定义系统骨架skel_human,skel_horse. <Skeletons><Skeleton name"skel_human"><Skeleton name"skel_horse"> </Skeletons> CommonRes\skeletons.brf 为skel_human的资源文件,适用BRF打…