利用Elasticsearch实现地理位置、城市搜索服务-编程知识

利用Elasticsearch实现地理位置、城市搜索服务

news/2025/2/19 6:47:23/文章来源:https://www.cnblogs.com/hbuuid/p/18327500

最近用到一些简单的地理位置查询接口，基于当前定位获取用户所在位置信息（省市区），然后基于该信息查询当前区域的......提供服务。

然后就自己研究了下GIS，作为一个程序员。自己能不能实现这个功能呢？答案当然是可以。立即开干。

思路：找到数据，写入数据库，利用Elasticsearch强大的搜索能力和丰富发热GIS数据处理能力实现。

GIS相关专业信息参考（bd上找到，还算专业）：程序员GIS入门|前后端都要懂一点的GIS知识

经过一番寻找，“功夫不负有心人”，在网上找到了锐多宝 数据，比较完整。下载下来，格式是shape格式。

第一步：下载数据，从锐多宝下载

第二步：写python脚本预处理数据：ShapFile 转 GeoJSON，ES处理GeoJSON比较强

import geopandas as gpd

# 读取 Shapefile
shapefile_path = 'D:/data/gis/2023年_CTAmap_1.12版/2023年省级/2023年省级.shp'
gdf = gpd.read_file(shapefile_path)

# 检查 GeoDataFrame
print(gdf.head())

# 如果需要，可以对数据进行预处理，比如过滤、选择特定列等
# gdf = gdf[['column1', 'column2', 'geometry']]

# 将 GeoDataFrame 转换为标准的 Pandas DataFrame (如果需要的话)
df = gdf.drop('geometry', axis=1).join(gdf['geometry'].apply(lambda x: gpd.GeoSeries(x).to_json()))

# 将 Pandas DataFrame 导出为 JSON 文件
output_json_path = 'D:/data/gis/2023-province-GeoJSON.gesjson'
# df.to_json(output_json_path, orient='records')

# 如果你想保留 GeoJSON 格式，可以直接保存 GeoDataFrame
gdf.to_file(output_json_path, driver='GeoJSON')

第三步：利用Python脚本将GeoJSON写入Elasticsearch

from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
import json# 连接到 Elasticsearch
es = Elasticsearch("http://localhost:9200")# 检查连接
if not es.ping():raise ValueError("Connection failed")# 删除旧索引（如果存在）
if es.indices.exists(index="province2023_geoshape_index_001"):es.indices.delete(index="province2023_geoshape_index_001")# 创建索引并定义 Mapping
mapping = {"mappings": {"properties": {"location": {"type": "geo_shape"},"name": {"type": "text"}}}
}# 创建索引
es.indices.create(index="province2023_geoshape_index_001", body=mapping)# 读取 GeoJSON 文件
with open("D:/data/gis/2023-province-GeoJSON.gesjson", "r", encoding="utf-8") as file:geojson_data = json.load(file)# 提取 GeoJSON 特征集合
features = geojson_data.get("features", [])# 准备数据以供导入
documents = []
for feature in features:doc = {"location": {"type": feature["geometry"]["type"],"coordinates": feature["geometry"]["coordinates"]}}if "properties" in feature:doc.update(feature["properties"])documents.append(doc)# 定义批量大小
batch_size = 100  # 每次批量导入的数量# 准备 actions
def generate_actions(documents):for doc in documents:yield {"_index": "province2023_geoshape_index_001","_source": doc}# 分批执行批量导入
for i in range(0, len(documents), batch_size):end = min(i + batch_size, len(documents))success, _ = bulk(es, generate_actions(documents[i:end]))print(f"Bulk {i}-{end} completed, {success} documents indexed.")print("All data indexed.")

第四步：计算出每条数据的区域的中心点（扩展功能，原始数据只有polygon多边形数据）

from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
import json
import ssl# 连接到 Elasticsearch
es = Elasticsearch("http://localhost:9200")# 检查连接
if not es.ping():raise ValueError("Connection failed")# 删除旧索引（如果存在）
if es.indices.exists(index="province2023_centroid_geoshape_index_001"):es.indices.delete(index="province2023_centroid_geoshape_index_001")# 创建索引并定义 Mapping
mapping = {"mappings": {"properties": {"location": {"type": "geo_shape"},"centroid": {  # 新增字段"type": "geo_point"},"name": {"type": "text"}}}
}# 创建索引
es.indices.create(index="province2023_centroid_geoshape_index_001", body=mapping)# 读取 GeoJSON 文件
with open("D:/data/gis/2023-province-GeoJSON.gesjson", "r", encoding="utf-8") as file:geojson_data = json.load(file)# 提取 GeoJSON 特征集合
features = geojson_data.get("features", [])def calculate_centroid(polygons):total_area = 0.0total_x = 0.0total_y = 0.0for polygon in polygons:# 现在 polygon 是一个包含多个坐标的列表centroid = calculate_simple_polygon_centroid(polygon)area = calculate_polygon_area(polygon)total_area += areatotal_x += centroid[0] * areatotal_y += centroid[1] * areaif total_area == 0:# 如果总面积为零，则返回原点作为中心点return [0, 0]else:return [total_x / total_area, total_y / total_area]# is_coordinates_list方法
# 以下结构返回True，polygon 是一个包含坐标列表的列表
# [
#     [[x1, y1], [x2, y2], [x3, y3], ...],
#     [[x1, y1], [x2, y2], [x3, y3], ...]  # 如果有内部孔洞
# ]
# 以下结构返回Fasle，包含单个坐标的列表
# [
#     [x1, y1],
#     [x2, y2],
#     [x3, y3],
#     ...
# ]def is_coordinate(coord):return (isinstance(coord, (list, tuple)) andlen(coord) == 2 andall(isinstance(c, (int, float)) for c in coord))def is_coordinates_list(coords):# 检查 coords 是否是一个包含坐标列表的列表if isinstance(coords, list):if all(isinstance(c, list) and all(is_coordinate(coord) for coord in c) for c in coords):return Truereturn Falsedef calculate_simple_polygon_centroid(polygon):# 确定 polygon 的结构if is_coordinates_list(polygon):# polygon 是一个包含坐标列表的列表x_sum = sum(coord[0] for coord in polygon[0])y_sum = sum(coord[1] for coord in polygon[0])num_points = len(polygon[0])else:# print(False, polygon[0])# polygon 是一个包含多个坐标的列表x_sum = sum(coord[0] for coord in polygon)y_sum = sum(coord[1] for coord in polygon)num_points = len(polygon)# 计算平均坐标centroid_x = x_sum / num_pointscentroid_y = y_sum / num_pointsreturn [centroid_x, centroid_y]def calculate_polygon_area(polygon):# 计算简单多边形的面积area = 0.0if is_coordinates_list(polygon):  # polygon 是一个包含坐标列表的列表num_points = len(polygon[0])for i in range(num_points):j = (i + 1) % num_pointsarea += polygon[0][i][0] * polygon[0][j][1]area -= polygon[0][j][0] * polygon[0][i][1]else:  # polygon 是一个包含多个坐标的列表num_points = len(polygon)for i in range(num_points):j = (i + 1) % num_pointsarea += polygon[i][0] * polygon[j][1]area -= polygon[j][0] * polygon[i][1]return abs(area) / 2.0# 准备数据以供导入
documents = []
for feature in features:# 检查坐标是否在有效范围内coordinates = feature["geometry"]["coordinates"]centroid = calculate_centroid(coordinates)doc = {"location": {"type": feature["geometry"]["type"],"coordinates": coordinates},"centroid": centroid,  # 添加中心点}if "properties" in feature:doc.update(feature["properties"])documents.append(doc)# 定义批量大小
batch_size = 100  # 每次批量导入的数量# 准备 actions
def generate_actions(documents):for doc in documents:yield {"_index": "district2023_centroid_geoshape_index_001","_source": doc}# 分批执行批量导入
for i in range(0, len(documents), batch_size):end = min(i + batch_size, len(documents))success, errors = bulk(es, generate_actions(documents[i:end]))if errors:print(f"Bulk {i}-{end} completed, {success} documents indexed, but {len(errors)} documents failed.")for error in errors:print(error)else:print(f"Bulk {i}-{end} completed, {success} documents indexed.")print("All data indexed.")

第五步：利用elasticsearch的pipeline和reindex能力预处理数据

# geo_centroid 聚合是一种高级聚合，它可以计算一组地理位置的中心点。在 Elasticsearch 中，这个功能属于高级特性，通常只在 X-Pack（现在称为 Elastic Security 和 Elastic Observability）的许可证中可用。
# 试用30天可以体验
POST /province2023_geoshape_index_001/_search
{"size": 0,"aggs": {"centroid": {"geo_centroid": {"field": "location"}}}
}POST province2023_centroid_geoshape_index_001/_search
{"query": {"term": {"省.keyword": {"value": "陕西省" }}}
}PUT _ingest/pipeline/copy_field_pipeline
{"description": "Copy the value of one field to another","processors": [{"copy": {"from": "省", "to": "province_name"}}]
}
GET province2023_centroid_geoshape_index_001/_mappingGET province2023_centroid_geoshape_index_001/_mappingPUT _ingest/pipeline/province_multiple_copy_fields_pipeline
{"description": "Copy multiple fields to new fields and rename fields to new fields","processors": [{"set": {"field": "province_name","value": "{{{省}}}"}},{"remove": {"field": "省"}},{"rename": {"field": "省级码","target_field": "province_code"}},{"rename": {"field": "省类型","target_field": "province_type"}},{"rename": {"field": "VAR_NAME","target_field": "var_name"}},{"rename": {"field": "ENG_NAME","target_field": "eng_name"}},{"rename": {"field": "FIRST_GID","target_field": "first_gid"}},{"rename": {"field": "FIRST_TYPE","target_field": "first_type"}}]
}GET province2023_centroid_geoshape_index_002/_countGET province2023_centroid_geoshape_index_002/_mapping
DELETE province2023_centroid_geoshape_index_002PUT province2023_centroid_geoshape_index_002
{"mappings": {"properties": {"eng_name": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"first_gid": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"first_type": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"var_name": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"centroid": {"type": "geo_point"},"location": {"type": "geo_shape"},"name": {"type": "text"},"year": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}}}}
}POST _reindex
{"source": {"index": "province2023_centroid_geoshape_index_001"},"dest": {"index": "province2023_centroid_geoshape_index_002","pipeline": "province_multiple_copy_fields_pipeline"}
}GET province2023_centroid_geoshape_index_002/_search

第六步：查询数据 geo_distance

# centroid字段的type是 geo_point，存储的经纬度形式是数组Geopoint as an array
# geo_bounding_box 可查找边框内的所有地理坐标点。
POST province2023_centroid_geoshape_index_002/_search
{"query": {"geo_bounding_box": { "centroid": {"top_left": {"lat": 42,"lon": -72},"bottom_right": {"lat": 40,"lon": -74}}}}
}POST province2023_centroid_geoshape_index_002/_search
{"query": {"geo_distance": {"distance": 100,"centroid": {"lat": 40.09937484066758,"lon": 116.41960604340115}}}
}POST province2023_centroid_geoshape_index_002/_search
{"query": {"bool": {"must": {"match": {"province_name":"xx市"}},"filter": {"geo_distance": {"distance": "2km","centroid": {"lat": 40.09937484066758,"lon": 116.41960604340115}}}}}
}POST province2023_centroid_geoshape_index_002/_search
{"query": {"bool": {"must": {"match": {"province_name":"xx市"}},"filter": {"geo_distance": {"distance": "200km","location": {"lat": 40.09937484066758,"lon": 116.41960604340115}}}}}
}

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.hqwc.cn/news/772796.html

如若内容造成侵权/违法违规/事实不符，请联系编程知识网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！

因子分析法————数据降维

因子分析法通过研究变量间的相关系数矩阵，把这些变量间错综复杂的关系归结成少数几个综合因子，起到了很好的降维作用目录一、因子分析与主成分分析的对比1.原理对比2.作用区别二、因子分析的实例三、因子分析的理论介绍1.因子分析的模型2.模型假设3.因子载荷矩阵的统计意义（…

Python 实现行为驱动开发 (BDD) 自动化测试详解

在当今的软件开发领域，行为驱动开发（Behavior Driven Development，BDD）作为一种新兴的测试方法，逐渐受到越来越多开发者的关注和青睐。Python作为一门功能强大且易于使用的编程语言，在实现BDD方面也有着独特的优势。那么，如何利用Python实现BDD自动化测试呢？本文将为…

Fenwick Tree

看这篇题解解释一下是为什么看蓝书的图，比如\(a_3\)对\(c_8\)的贡献，操作一次，贡献系数为\(1\)，然后将\(a_8\)中\(a_3\)的贡献次数改为\(1\)，考虑一下操作第二次在干什么，我们是先更新了\(a_3\)对\(c_4\)的贡献，然后让\(c_8\)为\(c_4\)和\(a_8\)（注意这里的\(a_8\)已…

04HTML+CSS

今天开始学了CSS，CSS叫做-层叠样式表。主要是来美化界面的。今日学习内容有 1.CSS的引入方式，CSS的引入方式有三种内部样式表：学习使用 CSS 代码写在 style 标签里面 l 外部样式表：开发使用 l CSS 代码写在单独的 CSS 文件中（.css）在 HTML 使用 link 标签引入，在.CSS文…

ssy中学暑假集训向量学习笔记(完结)

今天模拟赛T4是个极其恶心的东西，用到了许多高中数学知识，md，先引入前置知识。复数定义虚数单位\(i\) 满足\(i^2=-1\),复数域\(C\),形如\(a+bi,(a,b\in \mathbb{R})\)的数叫做复数。复数\(a+bi\)可以在坐标系中表示为\((a,b)\)的向量。同时复数的加减法满足向量的加减法…

ssy中学暑假集训向量学习笔记(应该能完结)

今天模拟赛T4是个极其恶心的东西，用到了许多高中数学知识，md，引入前置知识。向量定义顾名思义，向量就是有方向的量，在平面直角坐标系上可以用\((a,b)\)表示，图如下：图像上即为由\(A\)指向\(B\)的一条向量。投影投影不好解释，拿图吧。\(AC\)在\(AB\)上的投影就是\(…

2023.7.2-3-4Mssql xp_cmdshell提权

1.概念 Mssql和SQL sever的一个产品的不同名称。都属于微软公司旗下。而上述Mssql xp_cmdshell提权也属于数据库提权的一种。主要依赖于sql server自带的存储过程。 1.1xp_cmdshell提权扩展存储过程中xp_cmdshell是一个开放接口，可以让sql sever调用cmd命令。此过程在 SQL …

第二次测试部分题解 (c,d,g)

c-一个欧拉函数模板题1 #include<iostream>2 using namespace std;3 4 int main()5 {6 int n;7 cin >> n;8 int r = n;9 for (int i = 2; i * i <= n; i++) 10 { 11 if (n % i == 0) 12 { 13 r = r / i * (i -…

[随笔]我的创作纪念日

今天，是我开始创作的第256天，哈哈...这刚好是8位无符号二进制的“模”，一个“轮回”。一些心得和感悟、一些历程与经历、一些收获与体会，大家感兴趣可以看看。历程我最开始接触这个平台大约是在2020年10月份的时候，那时我正直大三上期，我已经开始备研。附言：黎老师（我…

搭建极狐GitLab(基于Docker): 步骤整合汇总记录

执行背景: (1) CentOS7(虚拟机ISO映像文件=CentOS-7-x86_64-DVD-2009.iso); (2) repo(yum)源已切换为国内源;命令汇总:1. 安装Docker 相关命令: # 查看仓库源中可使用版本 yum list docker-ce --showduplicates | sort -r# 安装指定版本 yum install docker-ce-docker完整版本号…

【待做】【攻防技术系列+权限提升】Windows提权

Windows提权思维导图Windows提权工具 vulmap vulmon开发的一款开源工具，原理是根据软件的名称和版本号来确定，是否有CVE及公开的EXP。这款Linux的工具挺好用，但是对于Windows系统层面不太适用。 windows-exp-suggester 这款和本工具的原理一样，尝试使用了之后，发现它的CVE…

[随笔]创作4天的心得感悟