Elasticsearch使用function_score查询酒店和排序

需求

基于用户地理位置,对酒店做简单的排序,非个性化的推荐。酒店评分包含以下:

  1. 酒店类型(依赖用户历史订单数据):希望匹配出更加符合用户使用的酒店类型
  2. 酒店评分:评分高的酒店用户体验感好
  3. geo地理位置评分:例如出差的用户,距离较近的较为便捷
  4. 价格评分(依赖用户历史订单数据):符合用户的消费习惯

实现

基于Elasticsearch 7.4,centos7环境。

索引Mapping

{"properties": {"address": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"addressEn": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"boardRoom": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"brandCode": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"businessZone": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"cityCode": {"type": "keyword"},"cityId": {"type": "long"},"cityName": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"commentFacilityPoint": {"type": "float"},"commentHygienePoint": {"type": "float"},"commentPoint": {"type": "float"},"commentPositionPoint": {"type": "float"},"commentRecommendPercent": {"type": "float"},"commentServicePoint": {"type": "float"},"diningRoom": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"email": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"factories": {"properties": {"facilityName": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"facilityType": {"type": "long"},"facilityValue": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}}}},"fixTime": {"type": "date","format": "yyyy-MM-dd"},"gdLocation": {"type": "geo_point"},"govStar": {"type": "long"},"govZone": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"gymnasium": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"hotelCode": {"type": "keyword"},"hotelDesc": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"hotelFacility": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"hotelGroup": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"hotelName": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"hotelNameEn": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"hotelService": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"hotelShortDesc": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"hotelStatus": {"type": "long"},"hotelTips": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"location": {"type": "geo_point"},"mainPicture": {"type": "keyword"},"minPrice": {"type": "float"},"openingTime": {"type": "date","format": "yyyy-MM-dd"},"parking": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"phoneNum": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"pickUpService": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"pictures": {"properties": {"pictureType": {"type": "long"},"pictureUrl": {"type": "keyword"}}},"postNumber": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"roomCount": {"type": "long"},"rooms": {"properties": {"bedNumber": {"type": "long"},"bedWidth": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"checkNumber": {"type": "long"},"facilities": {"properties": {"facilityValue": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"roomCode": {"type": "keyword"}}},"pictures": {"properties": {"pictureUrl": {"type": "keyword"},"roomCode": {"type": "keyword"}}},"roomArea": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"roomBedType": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"roomCigaretteInfo": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"roomCode": {"type": "keyword"},"roomCount": {"type": "long"},"roomFloor": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"roomMainPicture": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"roomName": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"windowType": {"type": "long"},"wrapRoomName": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}}}},"starCode": {"type": "long"},"starName": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"swimmingPool": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"trafficInfo": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"type": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"wifi": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}}}
}

字段的描述:

{"hotelCode": "酒店编号","hotelName": "酒店中文名","hotelNameEn": "酒店英文名","hotelStatus": "酒店状态:1启用,2挂起","cityId": "系统城市ID","cityCode": "城市编号","cityName": "城市名称","openingTime": "开业时间","fixTime": "装修时间","starCode": "星级编号(1,2,3,4,5)","starName": "星级描述","govStar": "是否挂牌星级:1:是;0:否","phoneNum": "电话","email": "邮件","postNumber": "邮编","location": "百度坐标","gdLocation": "高德坐标","address": "地址","addressEn": "地址","brandCode": "酒店品牌,例如“麗枫”。","hotelGroup": "酒店所属集团名称。例如“7天(铂涛)”。","roomCount": "房间数量","mainPicture": "图片地址","hotelTips": "酒店温馨提示信息","hotelFacility": "酒店设施","hotelService": "酒店服务","hotelShortDesc": "酒店简介","hotelDesc": "酒店详细介绍","trafficInfo": "交通信息","wifi": "是否有免费WIFI,字段不为空表示有该项服务","boardRoom": "是否有会议室,字段不为空表示有该项服务","diningRoom": "是否有餐厅,字段不为空表示有该项服务","parking": "是否有停车场,字段不为空表示有该项服务","pickUpService": "是否有接机服务,字段不为空表示有该项服务","swimmingPool": "是否有游泳池,字段不为空表示有该项服务","gymnasium": "是否有健身房,字段不为空表示有该项服务","govZone": "行政区域信息,信息来自于“按城市查询县级行政区域”接口","businessZone": "商圈信息","minPrice": "最低价","commentPoint": "酒店点评分数(满分5分)","commentRecommendPercent": "酒店有百分之多少用户推荐,例如90%时数据是90.0","commentPositionPoint": "对于酒店位置的单项点评分数(满分5分)","commentFacilityPoint": "对于酒店设施的单项点评分数(满分5分)","commentServicePoint": "对于酒店服务的单项点评分数(满分5分)","commentHygienePoint": "对于酒店卫生的单项点评分数(满分5分)"
}

查询酒店和排序

数据量较大,上传不了,有需要可私信获取demo酒店数据。

排序方式有推荐、距离、好评、低价、高价,这里我们实现推荐排序。

筛选条件也是多样的,如下所示,这里我们使用距离筛选:

  1. 评分:4.8以上、4.5以上、4.0以上、3.5以上
  2. 酒店类型:民宿、 酒店公寓、青年公寓、特色住宿、别墅、客栈、农家院、电竞酒店、情侣酒店
  3. 宾客类型:外宾适用、港澳台宾客适用
  4. 特色主题“地铁附近、亲子精选、商务出行、度假休闲、湖畔美居、动人夜景、依山傍水、地标景、四合院
  5. 酒店设施:免费停车、洗衣服、24小时热水、空调、停车场、棋牌室、健身房、接送机服务、洗衣服服务
  6. 房型:大床房、双床房、床位房、单人床房、电竞房、情侣房、影音房、私汤房、亲子房
  7. 餐食:含早
  8. 距离:1km以内、1-3km、3-5km、5-10km

基于地理位置(也可以增加其他条件)5km范围内的酒店数据使用function_scope排序。

在价格和位置上,我们期望和origin数据接近的酒店数据,使用了衰减函数进行评分,衰减函数详细说明在后面进行说明。

在酒店名称上,我们期待根据用户历史订单时间,赋予不同的权重,使用query_string查询。

注意的是boost_mode使用了replace使用function_score计算的分数,避免elasticsearch的文档评分干扰。

{"query": {"function_score": {"query": {"bool": {"must": {"match_all": {}},// 根据距离筛选数据"filter": {"geo_distance": {"distance": "5km","gdLocation": {"lat": "23.150261","lon": "113.324994"}}}}},"boost": 5,// max_boost 参数来限制新分数不超过一定的限制。 "max_boost": 100,"functions": [// 酒店类型(依赖用户历史数据){"filter": {// 根据历史数据,不同关键词设置权重“青年旅舍”权重1,“青年公寓”权重2,“酒店公寓”权重3"query_string": {"query": "hotelName:(\"青年旅舍\"^1 or \"青年公寓\"^2 or \"酒店公寓\"^3)"}},// 生成从 0 到但不包括 1 均匀分布的分数(非必填),默认情况下,它使用内部 Lucene 文档 ID 作为随机源"random_score": {// 使用_seq_no字段作为随机源,唯一的缺点是如果文档已更新,则分数将会更改"field": "_seq_no","seed": 10},"weight": 5},// 酒店评价{"filter": {"range": {// 酒店服务的单项点评分数"commentPoint": {"gte": 3.5,"lte": 5}}},"weight": 10},// 衰减函数(DECAY_FUNCTION )-geo 地理位置评分{// gauss 正常衰减"gauss": {// 在origin上偏移offset后随着scale进行衰减"gdLocation": {// 用于计算距离的原点 lon,lat(经纬度)"origin": "113.324994,23.150261",// 定义计算得分等于衰减参数时距原点 + 偏移量的距离"scale": "5km",// 如果定义了offset,则衰减函数将仅计算距离大于offset的文档的衰减函数。默认值为 0。"offset": "1km",//衰减参数定义如何在给定scale的距离上对文档进行评分。如果未定义衰减,则距离scale上的文档将评分为 0.5。"decay": "0.33"}},"weight": 15},//价格排序(依赖历史数据,缺省 150){"gauss": {// 在150元基础上偏移30元在100元范围内衰减"minPrice": {"origin": 150,"offset": 30,"scale": 100}},"weight": 10}],// functions函数的分数与查询的分数相结合// multiply:查询得分与functions得分相乘(默认)、replace:仅使用functions得分,忽略查询得分、sum:查询得分与functions得分相加、avg:平均、max:查询得分和functions得分的最大值、min:查询得分和functions得分的最小值"boost_mode": "replace",// Score_mode 指定如何组合计算functions函数的分数// multiply(默认)分数相乘、sum分数相加、avg:分数被平均、max:使用最高分数、min:使用最低分数"score_mode": "sum",// 默认情况下,修改分数不会更改匹配的文档。要排除不满足特定分数阈值的文档,可以将 min_score 参数设置为所需的分数阈值。"min_score": 0}},// 返回距离"script_fields": {"distance_in_m": {"script": "doc['gdLocation'].arcDistance(23.150261,113.324994)"}}
}

查询结果:

{"took": 10,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 3244,"relation": "eq"},"max_score": 24.119629,"hits": [{"_index": "hotel_test","_type": "_doc","_id": "jiMk1I0BqMZKQzdg8UCl","_score": 24.119629,"_source": {"gdLocation": {"lon": "113.288340","lat": "23.132313"},"address": "青龙坊2号","cityName": "广州市","commentPoint": 4.5,"minPrice": 158.0,"hotelName": "亨富涞酒店(广州青龙坊店)"},"fields": {"distance_in_m": [4246.059620137545]}},{"_index": "hotel_test","_type": "_doc","_id": "YSIa1I0BqMZKQzdgDqVC","_score": 23.682613,"_source": {"gdLocation": {"lon": "113.357566","lat": "23.134140"},"address": "中山大道西138号广运楼3层","cityName": "广州市","commentPoint": 4.1,"minPrice": 193.0,"hotelName": "棠舍公寓(广州天河公园华景新城店)"},"fields": {"distance_in_m": [3782.1797227009683]}},{"_index": "hotel_test","_type": "_doc","_id": "JyIY1I0BqMZKQzdgnpWo","_score": 23.155634,"_source": {"gdLocation": {"lon": "113.346694","lat": "23.173795"},"address": "天源路134-140号201铺","cityName": "广州市","commentPoint": 4.2,"minPrice": 150.0,"hotelName": "广州友逸·青舍酒店(天河客运站地铁站店)"},"fields": {"distance_in_m": [3430.6572488915003]}},{"_index": "hotel_test","_type": "_doc","_id": "cCIY1I0BqMZKQzdg3pj1","_score": 22.291739,"_source": {"gdLocation": {"lon": "113.342061","lat": "23.172472"},"address": "元岗街元岗南路13-15号之6","cityName": "广州市","commentPoint": 3.8,"minPrice": 128.0,"hotelName": "华舍连锁酒店(广州天河客运站店)"},"fields": {"distance_in_m": [3023.907431757608]}},{"_index": "hotel_test","_type": "_doc","_id": "xCIc1I0BqMZKQzdgr8yX","_score": 22.093195,"_source": {"gdLocation": {"lon": "113.347124","lat": "23.143523"},"address": "天河北路719-721号东方之珠花园","cityName": "广州市","commentPoint": 4.9,"minPrice": 76.0,"hotelName": "小李家青旅(广州华师店)"},"fields": {"distance_in_m": [2383.4711465212176]}},{"_index": "hotel_test","_type": "_doc","_id": "pyMi1I0BqMZKQzdgMxy2","_score": 22.071991,"_source": {"gdLocation": {"lon": "113.329772","lat": "23.134002"},"address": "天河路365号天俊阁1802","cityName": "广州市","commentPoint": 4.1,"minPrice": 70.0,"hotelName": "迎寓制式青旅(石牌桥地铁站店)"},"fields": {"distance_in_m": [1872.7678276747463]}},{"_index": "hotel_test","_type": "_doc","_id": "1SIY1I0BqMZKQzdgu5b6","_score": 21.591082,"_source": {"gdLocation": {"lon": "113.313978","lat": "23.120444"},"address": "寺右新马路131号","cityName": "广州市","commentPoint": 4.1,"minPrice": 190.0,"hotelName": "智营·星旅精选酒店(广州五羊邨地铁站店)"},"fields": {"distance_in_m": [3501.629279766179]}},{"_index": "hotel_test","_type": "_doc","_id": "HiMj1I0BqMZKQzdguS-z","_score": 21.376797,"_source": {"gdLocation": {"lon": "113.310007","lat": "23.153069"},"address": "先烈东路159号四航局大院4栋601房","cityName": "广州市","commentPoint": 4.2,"minPrice": 76.0,"hotelName": "广州兰姐青年公寓"},"fields": {"distance_in_m": [1563.7710672937392]}},{"_index": "hotel_test","_type": "_doc","_id": "QSIb1I0BqMZKQzdg6cMJ","_score": 21.36859,"_source": {"gdLocation": {"lon": "113.340093","lat": "23.173880"},"address": "元岗路600号自编2号(智汇park对面)","cityName": "广州市","commentPoint": 4.7,"minPrice": 152.0,"hotelName": "素舍2.0酒店(广州天河客运站天羽店)"},"fields": {"distance_in_m": [3046.3470547262573]}},{"_index": "hotel_test","_type": "_doc","_id": "ByIe1I0BqMZKQzdgYeOU","_score": 21.1093,"_source": {"gdLocation": {"lon": "113.341442","lat": "23.172095"},"address": "慧通产业园101栋A区","cityName": "广州市","commentPoint": 4.6,"minPrice": 176.0,"hotelName": "素舍酒店(广州天河客运站地铁站店)"},"fields": {"distance_in_m": [2953.2864749434843]}}]}
}

DECAY_FUNCTION-衰减函数

衰减函数(Decay Function)是一个数学函数,它用于描述一个数量随着时间、距离或其他因素递减的过程。衰减函数通常是指数函数或者多项式函数的形式,用以模拟现象如电磁波的衰减、放射性物质的衰变、药物在体内的代谢等。

在地理信息系统(Geographic Information Systems, GIS)或地理学领域中,衰减函数可以用来衡量地理位置之间的相互作用或影响随距离的递减。例如,一个城市的经济影响力对附近的城镇可能很大,但对更远的城镇影响则小得多,衰减函数可以用来量化这种影响力的减弱程度。

以下是一些衰减函数在地理领域的应用示例:

  1. 空间相互作用模型:在模拟城市之间的人口迁移、商业交往或通勤模式时,衰减函数可以用来表示随着距离增加,这些互动的可能性怎样降低。

  2. 热点分析:在热点分析中,可以用衰减函数来确定某一事件(如犯罪、病例报告等)对周围区域的影响,随距离递减。

  3. 可达性评估:在评估某个地点对于居民的可达性时,可以使用衰减函数来模拟不同交通模式(步行、开车等)的时间或距离衰减。

  4. 地理加权回归(Geographically Weighted Regression, GWR):在地理加权回归分析中,衰减函数用于赋予数据点一个权重,这个权重基于数据点之间的空间距离,更近的点有更大的影响力。

在具体应用中,选择合适的衰减函数类型和参数对模型结果的精确性有很大影响。常见的衰减函数形式包括:

  • 指数衰减函数:f(d) = e^(-λd),其中d是距离,λ是衰减系数。
  • 幂律衰减函数:f(d) = d^(-β),其中d是距离,β是衰减系数。

这些函数的参数通常需要根据实际数据进行拟合和调整,以最好地反映现实世界中的衰减现象。

elasticsearch 提供gauss、lin、exp 衰减函数,对比如下:

在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/495788.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Java 1.8 docker 镜像制作

文章目录 一、下载文件二、精简JRE三、Dockerfile四、构建镜像五、容器测试 一、下载文件 glibc 下载地址 glibc-2.35-r1.apk glibc-bin-2.35-r1.apk glibc-i18n-2.35-r1.apk rsa sgerrand.rsa.pub jre 1.8 jre-8u201-linux-x64.tar.gz 二、精简JRE 解压 tar -zxvf jre-8…

如何选择科技公司或者技术团队来开发软件项目呢

最近有客户问我们为什么同样软件项目不同公司报价和工期差异很大,我们给他解释好久才讲清楚,今天整理一下打算写一篇文章来总结一下,有需要开发朋友可以参考,我们下次遇到客户也可以直接转发文章给客户自己看。 我们根据我们自己报…

阿里云ECS服务器vCPU是什么意思?

阿里云ECS服务器vCPU和CPU是什么意思?CPU和vCPU有什么区别?一台云服务器ECS实例的CPU选项由CPU物理核心数和每核线程数决定,CPU是中央处理器,一个CPU可以包含若干个物理核,通过超线程HT(Hyper-Threading&am…

【小沐学QT】QT学习之OpenGL开发笔记

文章目录 1、简介2、Qt QOpenGLWidget gl函数3、Qt QOpenGLWidget qt函数4、Qt QOpenGLWindow5、Qt glut6、Qt glfw结语 1、简介 Qt提供了与OpenGL实现集成的支持,使开发人员有机会在更传统的用户界面的同时显示硬件加速的3D图形。 Qt有两种主要的UI开发方…

windows安装 RabbitMQ

首先打开 RabbitMQ 官网,点击 Get Started(开始) 点击 Download Installation(下载安装)。 这里提供了两种方式进行安装,我们使用第二种方法。 使用 chocolatey以管理用户身份使用官方安装程序 往下滑,第二种方法需要 Erlang 的依赖&#x…

【办公类-21-05】20240227单个word按“段落数”拆分多个Word(成果汇编 只有段落文字 1拆5)

作品展示 背景需求 前文对一套带有段落文字和表格的word进行13份拆分 【办公类-21-04】20240227单个word按“段落数”拆分多个Word(三级育婴师操作参考题目1拆13份)-CSDN博客文章浏览阅读293次,点赞8次,收藏3次。【办公类-21-04…

【论文阅读】基于人工智能目标检测与跟踪技术的过冷流沸腾气泡特征提取

Bubble feature extraction in subcooled flow boiling using AI-based object detection and tracking techniques 基于人工智能目标检测与跟踪技术的过冷流沸腾气泡特征提取 期刊信息:International Journal of Heat and Mass Transfer 2024 级别:EI检…

C++——类和对象(2):构造函数、析构函数、拷贝构造函数

2. 类的6个默认成员函数 我们将什么成员都没有的类称为空类,但是空类中并不是什么都没有。任何类中都会存在6个默认成员函数,这6个默认成员函数如果用户没有实现,则会由编译器默认生成。 6个默认成员函数包括:负责初始化工作的构造…

C语言-数据结构-顺序表

🌈个人主页: 会编辑的果子君 💫个人格言:“成为自己未来的主人~” 目录 数据结构相关概念 顺序表 顺序表的概念和结构 线性表 顺序表分类 顺序表和数组的区别 顺序表分类 静态顺序表 动态顺序表 头插和尾插 尾插 数据结构相关概念 数据结构…

人工智能之Tensorflow程序结构

TensorFlow作为分布式机器学习平台,主要架构如下: 网络层:远程过程调用(gRPC)和远程直接数据存取(RDMA)作为网络层,主要负责传递神经网络算法参数。 设备层:CPU、GPU等设备,主要负责神经网络算法中具体的运…

半小时到秒级,京东零售定时任务优化怎么做的?

导言: 京东零售技术团队通过真实线上案例总结了针对海量数据批处理任务的一些通用优化方法,除了供大家借鉴参考之外,也更希望通过这篇文章呼吁大家在平时开发程序时能够更加注意程序的性能和所消耗的资源,避免在流量突增时给系统…

【Leetcode】938. 二叉搜索树的范围和

文章目录 题目思路代码结论 题目 题目链接 给定二叉搜索树的根结点 root,返回值位于范围 [low, high] 之间的所有结点的值的和。 示例 1: 输入:root [10,5,15,3,7,null,18], low 7, high 15 输出:32 示例 2: 输入…