文章目录
- 安装
- Elasticsearch 安装
- Kibana 安装
- Rest风格API
- 操作索引
- 基本概念
- 示例
- 创建索引
- 查看索引
- 删除索引
- 映射配置(不配置好像也行、智能判断)
- 新增数据
- 随机生成ID
- 自定义ID
- 修改数据
- 删除数据
- 查询
- 基本查询
- 查询所有(match_all)
- 匹配查询(match)
- 多字段查询(multi_match)
- 词条匹配(term)
- 多词条精确匹配(terms)
- 结果过滤
- 高级查询
- 过滤(filter)
- 排序
- 聚合aggregations
- 测试数据准备
- 聚合为桶
- 桶内度量
- 桶内嵌套桶
- 划分桶的其它方式
- 阶梯分桶Histogram Aggregation
- 范围分桶Range Aggregation
- 不同语言客户端代码
其实很早之前在大学学习Java的时候学习过ES,做日志存储用,不过后来在项目里面没被采用,很多年没用过逐渐淡忘了,如今工作需要用到,又得重新看一遍,还是简单再记一下流程及一些使用,方便以后查看~~
【Elasticsearch中文文档-简介】https://elasticsearch.bookhub.tech/set_up_elasticsearch/configuring_elasticsearch/
安装
Elasticsearch 安装
下载网址:https://www.elastic.co/cn/downloads
两个软件下载的版本最好相同
另外注意 Elasticsearch 的版本需要和你电脑上安装的JAVA JDK版本对应
像我电脑是是jdk1.8 对应es版本是7.6.1
进入elasticsearch-7.6.1\bin
,双击elasticsearch.bat
即可运行,访问http://localhost:9200/
,得到如下内容:
{"name" : "HOULJ12","cluster_name" : "elasticsearch","cluster_uuid" : "hWbuydGRQLKKHia_KUVTjA","version" : {"number" : "7.6.1","build_flavor" : "default","build_type" : "zip","build_hash" : "aa751e09be0a5072e8570670309b1f12348f023b","build_date" : "2020-02-29T00:15:25.529771Z","build_snapshot" : false,"lucene_version" : "8.4.0","minimum_wire_compatibility_version" : "6.8.0","minimum_index_compatibility_version" : "6.0.0-beta1"},"tagline" : "You Know, for Search"
}
运行成功!
另外如果需要修改配置,基本上都在config文件夹下,可自行百度或看文档
Kibana 安装
进入kibana-7.6.1-windows-x86_64\bin
,双击kibana.bat
即可运行,访问http://localhost:5601/
【注】如果需要安装分词器,可参考文章尾部参考文章中有提到
Rest风格API
文档地址:https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
操作索引
基本概念
Elasticsearch也是基于Lucene的全文检索库,本质也是存储数据,很多概念与MySQL类似的,对应关系如下:
索引(indices)--------------------------------Databases 数据库类型(type)---------------------------------Table 数据表文档(Document)--------------------------Row 行字段(Field)---------------------------Columns 列
示例
创建索引
Elasticsearch采用Rest风格API,因此其API就是一次http请求,你可以用任何工具发起http请求
创建索引的请求格式:
请求方式:PUT
请求路径:/索引库名
请求参数:json格式
PUT /索引库名
查看索引
GET /索引库名
可以用*
来查看所有的索引库名:
kibana中也可以查看到:
删除索引
DELETE /索引库名
映射配置(不配置好像也行、智能判断)
创建映射字段
PUT /索引库名/_mapping/类型名称
{"properties": {"字段名": {"type": "类型","index": true,"store": true,"analyzer": "分词器"}}
}
类型名称
:就是前面将的type的概念,类似于数据库中的不同表字段名
:任意填写 ,可以指定许多属性,例如:type
:类型,可以是text、long、short、date、integer、object等index
:是否索引,默认为truestore
:是否存储,默认为falseanalyzer
:分词器,这里的ik_max_word即使用ik分词器
示例:
PUT bysl/_mapping/goods
{"properties": {"title": {"type": "text","analyzer": "ik_max_word"},"images": {"type": "keyword","index": "false"},"price": {"type": "float"}}
}
【参考】字段类型:https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html
响应结果:
{"acknowledged": true
}
查看映射关系
GET /索引库名/_mapping
新增数据
随机生成ID
通过POST请求,可以向一个已经存在的索引库中添加数据
POST /索引库名/类型名
{"key1":"value1","key2":"value2"
}
示例:
POST /test/goods/
{"title":"小米手机","images":"http://image.bysl.com/12479122.jpg","price":2699.00
}
查看数据:
GET test/_search
{"query":{"match_all":{}}
}
_source
:源文档信息,所有的数据都在里面。
_id
:这条文档的唯一标示,与文档自己的id字段没有关联
自定义ID
POST /索引库名/类型/ID值
{"key1":"value1","key2":"value2"
}
修改数据
把刚才新增的请求方式改为PUT,就是修改了。不过修改必须指定id:
- id对应文档存在,则修改
- id对应文档不存在,则新增
PUT /test/goods/IPlrK40BcVBqhfH0rHnI
{"title":"小米手机","images":"http://image.bysl.com/12479122.jpg","price":2688.00
}
删除数据
删除使用DELETE请求,同样,需要根据id进行删除:
DELETE /索引库名/类型名/ID值
查询
基本查询
GET /索引库名/_search
{"query":{"查询类型":{"查询条件":"查询条件值"}}
}
这里的query代表一个查询对象,里面可以有不同的查询属性
- 查询类型:
- 例如:
match_all
,match
,term
,range
等等
- 例如:
- 查询条件:查询条件会根据类型的不同,写法也有差异,后面详细讲解
查询所有(match_all)
GET /test/_search
{"query":{"match_all": {}}
}
查询结果:
{"took" : 774,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 3,"relation" : "eq"},"max_score" : 1.0,"hits" : [{"_index" : "test","_type" : "goods","_id" : "IPlrK40BcVBqhfH0rHnI","_score" : 1.0,"_source" : {"title" : "小米手机","images" : "http://image.bysl.com/12479122.jpg","price" : 2688.0}},{"_index" : "test","_type" : "goods","_id" : "2","_score" : 1.0,"_source" : {"title" : "小米手机222222","images" : "http://image.bysl.com/133.jpg","price" : 3999.0}},{"_index" : "test","_type" : "goods","_id" : "3","_score" : 1.0,"_source" : {"title" : "联想小新","images" : "http://image.bysl.com/17777.jpg","price" : 6999.0}}]}
}
took
:查询花费时间,单位是毫秒time_out
:是否超时_shards
:分片信息hits
:搜索结果总览对象total
:搜索到的总条数max_score
:所有结果中文档得分的最高分hits
:搜索结果的文档对象数组,每个元素是一条搜索到的文档信息_index
:索引库_type
:文档类型_id
:文档id_score
:文档得分_source
:文档的源数据
匹配查询(match)
or关系
match
类型查询,会把查询条件进行分词,然后进行查询,多个词条之间是or的关系
GET /test/_search
{"query":{"match":{"title":"联想"}}
}
搜索结果:
{"took" : 12,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : 2.0253944,"hits" : [{"_index" : "test","_type" : "goods","_id" : "3","_score" : 2.0253944,"_source" : {"title" : "联想小新","images" : "http://image.bysl.com/17777.jpg","price" : 6999.0}}]}
}
在上面的案例中,多个词之间是or的关系:如果我输入联想小米
,则包含联想或小米的都会查出来:
and关系
某些情况下,我们需要更精确查找,我们希望这个关系变成and,可以这样做:
GET /test/_search
{"query":{"match": {"title": {"query": "手机小米","operator": "and"}}}
}
本例中,只有同时包含小米
和手机
的词条才会被搜索到
or和and之间?
在 or 与 and 间二选一有点过于非黑即白。 如果用户给定的条件分词后有 5 个查询词项,想查找只包含其中 4 个词的文档,该如何处理?将 operator 操作符参数设置成 and 只会将此文档排除。
有时候这正是我们期望的,但在全文搜索的大多数应用场景下,我们既想包含那些可能相关的文档,同时又排除那些不太相关的。换句话说,我们想要处于中间某种结果。
match 查询支持 minimum_should_match
最小匹配参数, 这让我们可以指定必须匹配的词项数用来表示一个文档是否相关。我们可以将其设置为某个具体数字,更常用的做法是将其设置为一个百分数,因为我们无法控制用户搜索时输入的单词数量:
GET /test/_search
{"query":{"match":{"title":{"query":"小米曲面电视","minimum_should_match": "75%"}}}
}
本例中,搜索语句可以分为3个词,如果使用and关系,需要同时满足3个词才会被搜索到。这里我们采用最小品牌数:75%,那么也就是说只要匹配到总词条数量的75%即可,这里3*75% 约等于2。所以只要包含2个词条就算满足条件了
多字段查询(multi_match)
multi_match
与match
类似,不同的是它可以在多个字段中查询
GET /test/_search
{"query":{"multi_match": {"query": "小米","fields": [ "title", "images" ]}}
}
词条匹配(term)
term
查询被用于精确值 匹配
,这些精确值可能是数字、时间、布尔或者那些未分词
的字符串
GET /test/_search
{"query":{"term":{"price":2699.00}}
}
多词条精确匹配(terms)
GET /test/_search
{"query":{"terms":{"price":[2699.00,2899.00,3899.00]}}
}
结果过滤
直接指定字段_source
默认情况下,elasticsearch在搜索的结果中,会把文档中保存在_source
的所有字段都返回。
如果我们只想获取其中的部分字段,我们可以添加_source的过滤
GET /test/_search
{"_source": ["title","price"],"query": {"term": {"price": 2688}}
}
指定includes和excludes
includes
:来指定想要显示的字段excludes
:来指定不想要显示的字段
GET /test/_search
{"_source": {"includes":["title","price"]},"query": {"term": {"price": 2688}}
}
高级查询
布尔组合(bool)
bool
把各种其它查询通过must
(与)、must_not
(非)、should
(或)的方式进行组合
GET /test/_search
{"query":{"bool":{"must": { "match": { "title": "大米" }},"must_not": { "match": { "title": "电视" }},"should": { "match": { "title": "手机" }}}}
}
范围查询(range)
range
查询找出那些落在指定区间内的数字或者时间
GET /test/_search
{"query":{"range": {"price": {"gte": 1000.0,"lt": 2800.00}}}
}
range
查询允许以下字符:
操作符 | 说明 |
---|---|
gt | 大于 |
gte | 大于等于 |
lt | 小于 |
lte | 小于等于 |
模糊查询(fuzzy)
fuzzy
查询是 term
查询的模糊等价
。它允许用户搜索词条与实际词条的拼写出现偏差,但是偏差的编辑距离不得超过2:
GET /test/_search
{"query": {"fuzzy": {"title": "米"}}
}
可以通过fuzziness
来指定允许的编辑距离:
GET /test/_search
{"query": {"fuzzy": {"title": {"value":"米","fuzziness":1}}}
}
过滤(filter)
所有的查询都会影响到文档的评分及排名。如果我们需要在查询结果中进行过滤,并且不希望过滤条件影响评分,那么就不要把过滤条件作为查询条件来用。而是使用filter
方式:
GET /test/_search
{"query":{"bool":{"must":{ "match": { "title": "小米手机" }},"filter":{"range":{"price":{"gt":2000.00,"lt":3800.00}}}}}
}
【注意】filter中
还可以再次进行bool
组合条件过滤!
如果一次查询只有过滤,没有查询条件,不希望进行评分,我们可以使用constant_score
取代只有 filter
语句的 bool
查询。在性能上是完全相同的,但对于提高查询简洁性和清晰度有很大帮助
GET /test/_search
{"query":{"constant_score": {"filter": {"range":{"price":{"gt":2000.00,"lt":3000.00}}}}
}
排序
单字段排序
sort
可以让我们按照不同的字段进行排序,并且通过order
指定排序的方式
GET /test/_search
{"query": {"match": {"title": "小米手机"}},"sort": [{"price": {"order": "desc"}}]
}
多字段排序
假定我们想要结合使用 price和 _score(得分,假设有) 进行查询,并且匹配的结果首先按照价格排序,然后按照相关性得分排序:
GET /goods/_search
{"query":{"bool":{"must":{ "match": { "title": "小米手机" }},"filter":{"range":{"price":{"gt":200000,"lt":300000}}}}},"sort": [{ "price": { "order": "desc" }},{ "_score": { "order": "desc" }}]
}
聚合aggregations
测试数据准备
创建索引:
PUT /cars
存入数据:
POST /cars/transactions/_bulk
{ "index": {}}
{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }
聚合为桶
GET /cars/_search
{"size" : 0,"aggs" : { "popular_colors" : { "terms" : { "field" : "color.keyword"}}}
}
size
: 查询条数,这里设置为0,因为我们不关心搜索到的数据,只关心聚合结果,提高效率aggs
:声明这是一个聚合查询,是aggregations的缩写popular_colors
:给这次聚合起一个名字,任意。terms
:划分桶的方式,这里是根据词条划分field
:划分桶的字段
结果:
{"took" : 7,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 8,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"popular_colors" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "red","doc_count" : 4},{"key" : "blue","doc_count" : 2},{"key" : "green","doc_count" : 2}]}}
}
hits
:查询结果为空,因为我们设置了size为0aggregations
:聚合的结果popular_colors
:我们定义的聚合名称buckets
:查找到的桶,每个不同的color字段值都会形成一个桶key
:这个桶对应的color字段的值doc_count
:这个桶中的文档数量
桶内度量
GET /cars/_search
{"size" : 0,"aggs" : { "popular_colors" : { "terms" : { "field" : "color.keyword"},"aggs":{"avg_price": { "avg": {"field": "price" }}}}}
}
aggs
:我们在上一个aggs(popular_colors)中添加新的aggs。可见度量也是一个聚合
avg_price
:聚合的名称avg
:度量的类型,这里是求平均值field
:度量运算的字段
结果:
{"took" : 2,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 8,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"popular_colors" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "red","doc_count" : 4,"avg_price" : {"value" : 32500.0}},{"key" : "blue","doc_count" : 2,"avg_price" : {"value" : 20000.0}},{"key" : "green","doc_count" : 2,"avg_price" : {"value" : 21000.0}}]}}
}
桶内嵌套桶
GET /cars/_search
{"size" : 0,"aggs" : { "popular_colors" : { "terms" : { "field" : "color.keyword"},"aggs":{"avg_price": { "avg": {"field": "price" }},"maker":{"terms":{"field":"make.keyword"}}}}}
}
- 原来的color桶和avg计算我们不变
maker
:在嵌套的aggs下新添一个桶,叫做makerterms
:桶的划分类型依然是词条filed
:这里根据make字段进行划分
结果:
{"took" : 2,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 8,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"popular_colors" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "red","doc_count" : 4,"maker" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "honda","doc_count" : 3},{"key" : "bmw","doc_count" : 1}]},"avg_price" : {"value" : 32500.0}},{"key" : "blue","doc_count" : 2,"maker" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "ford","doc_count" : 1},{"key" : "toyota","doc_count" : 1}]},"avg_price" : {"value" : 20000.0}},{"key" : "green","doc_count" : 2,"maker" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "ford","doc_count" : 1},{"key" : "toyota","doc_count" : 1}]},"avg_price" : {"value" : 21000.0}}]}}
}
- 我们可以看到,新的聚合
maker
被嵌套在原来每一个color
的桶中。 - 每个颜色下面都根据
make
字段进行了分组 - 我们能读取到的信息:
- 红色车共有4辆
- 红色车的平均售价是 32500 美元。
- 其中3辆是 Honda 本田制造,1辆是 BMW 宝马制造。
划分桶的其它方式
划分桶的方式有很多,例如:
Date Histogram Aggregation
:根据日期阶梯分组,例如给定阶梯为周,会自动每周分为一组Histogram Aggregation
:根据数值阶梯分组,与日期类似Terms Aggregation
:根据词条内容分组,词条内容完全匹配的为一组Range Aggregation
:数值和日期的范围分组,指定开始和结束,然后按段分组
刚刚的案例中,我们采用的是Terms Aggregation
,即根据词条划分桶
阶梯分桶Histogram Aggregation
比如,我们对汽车的价格进行分组,指定间隔interval为5000:
GET /cars/_search
{"size":0,"aggs":{"price":{"histogram": {"field": "price","interval": 5000}}}
}
结果:
{"took" : 7,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 8,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"price" : {"buckets" : [{"key" : 10000.0,"doc_count" : 2},{"key" : 15000.0,"doc_count" : 1},{"key" : 20000.0,"doc_count" : 2},{"key" : 25000.0,"doc_count" : 1},{"key" : 30000.0,"doc_count" : 1},{"key" : 35000.0,"doc_count" : 0},{"key" : 40000.0,"doc_count" : 0},{"key" : 45000.0,"doc_count" : 0},{"key" : 50000.0,"doc_count" : 0},{"key" : 55000.0,"doc_count" : 0},{"key" : 60000.0,"doc_count" : 0},{"key" : 65000.0,"doc_count" : 0},{"key" : 70000.0,"doc_count" : 0},{"key" : 75000.0,"doc_count" : 0},{"key" : 80000.0,"doc_count" : 1}]}}
}
但是中间有大量的文档数量为0的桶,可以增加一个参数min_doc_count为1,来约束最少文档数量为1,这样文档数量为0的桶会被过滤:
GET /cars/_search
{"size":0,"aggs":{"price":{"histogram": {"field": "price","interval": 5000,"min_doc_count": 1}}}
}
结果:
{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 8,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"price" : {"buckets" : [{"key" : 10000.0,"doc_count" : 2},{"key" : 15000.0,"doc_count" : 1},{"key" : 20000.0,"doc_count" : 2},{"key" : 25000.0,"doc_count" : 1},{"key" : 30000.0,"doc_count" : 1},{"key" : 80000.0,"doc_count" : 1}]}}
}
可以在kibana
中的visiualize
中看图形化直观展示:
范围分桶Range Aggregation
范围分桶与阶梯分桶类似,也是把数字按照阶段进行分组,只不过range方式需要你自己指定每一组的起始和结束大小
GET /cars/_search
{"size":0,"aggs":{"sold":{"range": {"field": "sold","ranges" : [{ "from" : "2014-07-02", "to" : "2014-08-30" }]}}}
}
结果:
{"took" : 4,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 8,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"sold" : {"buckets" : [{"key" : "2014-07-02T00:00:00.000Z-2014-08-30T00:00:00.000Z","from" : 1.4042592E12,"from_as_string" : "2014-07-02T00:00:00.000Z","to" : 1.4093568E12,"to_as_string" : "2014-08-30T00:00:00.000Z","doc_count" : 2}]}}
}
【注意】这里差一点跟查时间范围内数据混淆了,查sold时间范围内数据如下:
GET /cars/_search
{"query": {"range": {"sold": {"gte": "2014-07-01","lte": "2014-09-01"}}}
}
结果:
{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 1.0,"hits" : [{"_index" : "cars","_type" : "transactions","_id" : "JPmhLI0BcVBqhfH0_Xnk","_score" : 1.0,"_source" : {"price" : 15000,"color" : "blue","make" : "toyota","sold" : "2014-07-02"}},{"_index" : "cars","_type" : "transactions","_id" : "JfmhLI0BcVBqhfH0_Xnk","_score" : 1.0,"_source" : {"price" : 12000,"color" : "green","make" : "toyota","sold" : "2014-08-19"}}]}
}
不同语言客户端代码
【文档参考】https://www.elastic.co/guide/en/elasticsearch/client/index.html
【参考】https://zhuanlan.zhihu.com/p/649902671
【参考】https://blog.csdn.net/TinaCSDN/article/details/108290648
【参考】https://blog.csdn.net/mo_sss/article/details/133808562
【参考】https://blog.csdn.net/mijichui2153/article/details/126177041