需求
基于用户地理位置,对酒店做简单的排序,非个性化的推荐。酒店评分包含以下:
- 酒店类型(依赖用户历史订单数据):希望匹配出更加符合用户使用的酒店类型
- 酒店评分:评分高的酒店用户体验感好
- geo地理位置评分:例如出差的用户,距离较近的较为便捷
- 价格评分(依赖用户历史订单数据):符合用户的消费习惯
实现
基于Elasticsearch 7.4,centos7环境。
索引Mapping
{"properties": {"address": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"addressEn": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"boardRoom": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"brandCode": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"businessZone": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"cityCode": {"type": "keyword"},"cityId": {"type": "long"},"cityName": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"commentFacilityPoint": {"type": "float"},"commentHygienePoint": {"type": "float"},"commentPoint": {"type": "float"},"commentPositionPoint": {"type": "float"},"commentRecommendPercent": {"type": "float"},"commentServicePoint": {"type": "float"},"diningRoom": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"email": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"factories": {"properties": {"facilityName": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"facilityType": {"type": "long"},"facilityValue": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}}}},"fixTime": {"type": "date","format": "yyyy-MM-dd"},"gdLocation": {"type": "geo_point"},"govStar": {"type": "long"},"govZone": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"gymnasium": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"hotelCode": {"type": "keyword"},"hotelDesc": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"hotelFacility": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"hotelGroup": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"hotelName": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"hotelNameEn": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"hotelService": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"hotelShortDesc": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"hotelStatus": {"type": "long"},"hotelTips": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"location": {"type": "geo_point"},"mainPicture": {"type": "keyword"},"minPrice": {"type": "float"},"openingTime": {"type": "date","format": "yyyy-MM-dd"},"parking": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"phoneNum": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"pickUpService": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"pictures": {"properties": {"pictureType": {"type": "long"},"pictureUrl": {"type": "keyword"}}},"postNumber": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"roomCount": {"type": "long"},"rooms": {"properties": {"bedNumber": {"type": "long"},"bedWidth": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"checkNumber": {"type": "long"},"facilities": {"properties": {"facilityValue": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"roomCode": {"type": "keyword"}}},"pictures": {"properties": {"pictureUrl": {"type": "keyword"},"roomCode": {"type": "keyword"}}},"roomArea": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"roomBedType": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"roomCigaretteInfo": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"roomCode": {"type": "keyword"},"roomCount": {"type": "long"},"roomFloor": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"roomMainPicture": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"roomName": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"windowType": {"type": "long"},"wrapRoomName": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}}}},"starCode": {"type": "long"},"starName": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"swimmingPool": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"trafficInfo": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"type": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"wifi": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}}}
}
字段的描述:
{"hotelCode": "酒店编号","hotelName": "酒店中文名","hotelNameEn": "酒店英文名","hotelStatus": "酒店状态:1启用,2挂起","cityId": "系统城市ID","cityCode": "城市编号","cityName": "城市名称","openingTime": "开业时间","fixTime": "装修时间","starCode": "星级编号(1,2,3,4,5)","starName": "星级描述","govStar": "是否挂牌星级:1:是;0:否","phoneNum": "电话","email": "邮件","postNumber": "邮编","location": "百度坐标","gdLocation": "高德坐标","address": "地址","addressEn": "地址","brandCode": "酒店品牌,例如“麗枫”。","hotelGroup": "酒店所属集团名称。例如“7天(铂涛)”。","roomCount": "房间数量","mainPicture": "图片地址","hotelTips": "酒店温馨提示信息","hotelFacility": "酒店设施","hotelService": "酒店服务","hotelShortDesc": "酒店简介","hotelDesc": "酒店详细介绍","trafficInfo": "交通信息","wifi": "是否有免费WIFI,字段不为空表示有该项服务","boardRoom": "是否有会议室,字段不为空表示有该项服务","diningRoom": "是否有餐厅,字段不为空表示有该项服务","parking": "是否有停车场,字段不为空表示有该项服务","pickUpService": "是否有接机服务,字段不为空表示有该项服务","swimmingPool": "是否有游泳池,字段不为空表示有该项服务","gymnasium": "是否有健身房,字段不为空表示有该项服务","govZone": "行政区域信息,信息来自于“按城市查询县级行政区域”接口","businessZone": "商圈信息","minPrice": "最低价","commentPoint": "酒店点评分数(满分5分)","commentRecommendPercent": "酒店有百分之多少用户推荐,例如90%时数据是90.0","commentPositionPoint": "对于酒店位置的单项点评分数(满分5分)","commentFacilityPoint": "对于酒店设施的单项点评分数(满分5分)","commentServicePoint": "对于酒店服务的单项点评分数(满分5分)","commentHygienePoint": "对于酒店卫生的单项点评分数(满分5分)"
}
查询酒店和排序
数据量较大,上传不了,有需要可私信获取demo酒店数据。
排序方式有推荐、距离、好评、低价、高价,这里我们实现推荐排序。
筛选条件也是多样的,如下所示,这里我们使用距离筛选:
- 评分:4.8以上、4.5以上、4.0以上、3.5以上
- 酒店类型:民宿、 酒店公寓、青年公寓、特色住宿、别墅、客栈、农家院、电竞酒店、情侣酒店
- 宾客类型:外宾适用、港澳台宾客适用
- 特色主题“地铁附近、亲子精选、商务出行、度假休闲、湖畔美居、动人夜景、依山傍水、地标景、四合院
- 酒店设施:免费停车、洗衣服、24小时热水、空调、停车场、棋牌室、健身房、接送机服务、洗衣服服务
- 房型:大床房、双床房、床位房、单人床房、电竞房、情侣房、影音房、私汤房、亲子房
- 餐食:含早
- 距离:1km以内、1-3km、3-5km、5-10km
基于地理位置(也可以增加其他条件)5km范围内的酒店数据使用function_scope排序。
在价格和位置上,我们期望和origin
数据接近的酒店数据,使用了衰减函数进行评分,衰减函数详细说明在后面进行说明。
在酒店名称上,我们期待根据用户历史订单时间,赋予不同的权重,使用query_string
查询。
注意的是boost_mode
使用了replace
使用function_score
计算的分数,避免elasticsearch的文档评分干扰。
{"query": {"function_score": {"query": {"bool": {"must": {"match_all": {}},// 根据距离筛选数据"filter": {"geo_distance": {"distance": "5km","gdLocation": {"lat": "23.150261","lon": "113.324994"}}}}},"boost": 5,// max_boost 参数来限制新分数不超过一定的限制。 "max_boost": 100,"functions": [// 酒店类型(依赖用户历史数据){"filter": {// 根据历史数据,不同关键词设置权重“青年旅舍”权重1,“青年公寓”权重2,“酒店公寓”权重3"query_string": {"query": "hotelName:(\"青年旅舍\"^1 or \"青年公寓\"^2 or \"酒店公寓\"^3)"}},// 生成从 0 到但不包括 1 均匀分布的分数(非必填),默认情况下,它使用内部 Lucene 文档 ID 作为随机源"random_score": {// 使用_seq_no字段作为随机源,唯一的缺点是如果文档已更新,则分数将会更改"field": "_seq_no","seed": 10},"weight": 5},// 酒店评价{"filter": {"range": {// 酒店服务的单项点评分数"commentPoint": {"gte": 3.5,"lte": 5}}},"weight": 10},// 衰减函数(DECAY_FUNCTION )-geo 地理位置评分{// gauss 正常衰减"gauss": {// 在origin上偏移offset后随着scale进行衰减"gdLocation": {// 用于计算距离的原点 lon,lat(经纬度)"origin": "113.324994,23.150261",// 定义计算得分等于衰减参数时距原点 + 偏移量的距离"scale": "5km",// 如果定义了offset,则衰减函数将仅计算距离大于offset的文档的衰减函数。默认值为 0。"offset": "1km",//衰减参数定义如何在给定scale的距离上对文档进行评分。如果未定义衰减,则距离scale上的文档将评分为 0.5。"decay": "0.33"}},"weight": 15},//价格排序(依赖历史数据,缺省 150){"gauss": {// 在150元基础上偏移30元在100元范围内衰减"minPrice": {"origin": 150,"offset": 30,"scale": 100}},"weight": 10}],// functions函数的分数与查询的分数相结合// multiply:查询得分与functions得分相乘(默认)、replace:仅使用functions得分,忽略查询得分、sum:查询得分与functions得分相加、avg:平均、max:查询得分和functions得分的最大值、min:查询得分和functions得分的最小值"boost_mode": "replace",// Score_mode 指定如何组合计算functions函数的分数// multiply(默认)分数相乘、sum分数相加、avg:分数被平均、max:使用最高分数、min:使用最低分数"score_mode": "sum",// 默认情况下,修改分数不会更改匹配的文档。要排除不满足特定分数阈值的文档,可以将 min_score 参数设置为所需的分数阈值。"min_score": 0}},// 返回距离"script_fields": {"distance_in_m": {"script": "doc['gdLocation'].arcDistance(23.150261,113.324994)"}}
}
查询结果:
{"took": 10,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 3244,"relation": "eq"},"max_score": 24.119629,"hits": [{"_index": "hotel_test","_type": "_doc","_id": "jiMk1I0BqMZKQzdg8UCl","_score": 24.119629,"_source": {"gdLocation": {"lon": "113.288340","lat": "23.132313"},"address": "青龙坊2号","cityName": "广州市","commentPoint": 4.5,"minPrice": 158.0,"hotelName": "亨富涞酒店(广州青龙坊店)"},"fields": {"distance_in_m": [4246.059620137545]}},{"_index": "hotel_test","_type": "_doc","_id": "YSIa1I0BqMZKQzdgDqVC","_score": 23.682613,"_source": {"gdLocation": {"lon": "113.357566","lat": "23.134140"},"address": "中山大道西138号广运楼3层","cityName": "广州市","commentPoint": 4.1,"minPrice": 193.0,"hotelName": "棠舍公寓(广州天河公园华景新城店)"},"fields": {"distance_in_m": [3782.1797227009683]}},{"_index": "hotel_test","_type": "_doc","_id": "JyIY1I0BqMZKQzdgnpWo","_score": 23.155634,"_source": {"gdLocation": {"lon": "113.346694","lat": "23.173795"},"address": "天源路134-140号201铺","cityName": "广州市","commentPoint": 4.2,"minPrice": 150.0,"hotelName": "广州友逸·青舍酒店(天河客运站地铁站店)"},"fields": {"distance_in_m": [3430.6572488915003]}},{"_index": "hotel_test","_type": "_doc","_id": "cCIY1I0BqMZKQzdg3pj1","_score": 22.291739,"_source": {"gdLocation": {"lon": "113.342061","lat": "23.172472"},"address": "元岗街元岗南路13-15号之6","cityName": "广州市","commentPoint": 3.8,"minPrice": 128.0,"hotelName": "华舍连锁酒店(广州天河客运站店)"},"fields": {"distance_in_m": [3023.907431757608]}},{"_index": "hotel_test","_type": "_doc","_id": "xCIc1I0BqMZKQzdgr8yX","_score": 22.093195,"_source": {"gdLocation": {"lon": "113.347124","lat": "23.143523"},"address": "天河北路719-721号东方之珠花园","cityName": "广州市","commentPoint": 4.9,"minPrice": 76.0,"hotelName": "小李家青旅(广州华师店)"},"fields": {"distance_in_m": [2383.4711465212176]}},{"_index": "hotel_test","_type": "_doc","_id": "pyMi1I0BqMZKQzdgMxy2","_score": 22.071991,"_source": {"gdLocation": {"lon": "113.329772","lat": "23.134002"},"address": "天河路365号天俊阁1802","cityName": "广州市","commentPoint": 4.1,"minPrice": 70.0,"hotelName": "迎寓制式青旅(石牌桥地铁站店)"},"fields": {"distance_in_m": [1872.7678276747463]}},{"_index": "hotel_test","_type": "_doc","_id": "1SIY1I0BqMZKQzdgu5b6","_score": 21.591082,"_source": {"gdLocation": {"lon": "113.313978","lat": "23.120444"},"address": "寺右新马路131号","cityName": "广州市","commentPoint": 4.1,"minPrice": 190.0,"hotelName": "智营·星旅精选酒店(广州五羊邨地铁站店)"},"fields": {"distance_in_m": [3501.629279766179]}},{"_index": "hotel_test","_type": "_doc","_id": "HiMj1I0BqMZKQzdguS-z","_score": 21.376797,"_source": {"gdLocation": {"lon": "113.310007","lat": "23.153069"},"address": "先烈东路159号四航局大院4栋601房","cityName": "广州市","commentPoint": 4.2,"minPrice": 76.0,"hotelName": "广州兰姐青年公寓"},"fields": {"distance_in_m": [1563.7710672937392]}},{"_index": "hotel_test","_type": "_doc","_id": "QSIb1I0BqMZKQzdg6cMJ","_score": 21.36859,"_source": {"gdLocation": {"lon": "113.340093","lat": "23.173880"},"address": "元岗路600号自编2号(智汇park对面)","cityName": "广州市","commentPoint": 4.7,"minPrice": 152.0,"hotelName": "素舍2.0酒店(广州天河客运站天羽店)"},"fields": {"distance_in_m": [3046.3470547262573]}},{"_index": "hotel_test","_type": "_doc","_id": "ByIe1I0BqMZKQzdgYeOU","_score": 21.1093,"_source": {"gdLocation": {"lon": "113.341442","lat": "23.172095"},"address": "慧通产业园101栋A区","cityName": "广州市","commentPoint": 4.6,"minPrice": 176.0,"hotelName": "素舍酒店(广州天河客运站地铁站店)"},"fields": {"distance_in_m": [2953.2864749434843]}}]}
}
DECAY_FUNCTION-衰减函数
衰减函数(Decay Function)是一个数学函数,它用于描述一个数量随着时间、距离或其他因素递减的过程。衰减函数通常是指数函数或者多项式函数的形式,用以模拟现象如电磁波的衰减、放射性物质的衰变、药物在体内的代谢等。
在地理信息系统(Geographic Information Systems, GIS)或地理学领域中,衰减函数可以用来衡量地理位置之间的相互作用或影响随距离的递减。例如,一个城市的经济影响力对附近的城镇可能很大,但对更远的城镇影响则小得多,衰减函数可以用来量化这种影响力的减弱程度。
以下是一些衰减函数在地理领域的应用示例:
-
空间相互作用模型:在模拟城市之间的人口迁移、商业交往或通勤模式时,衰减函数可以用来表示随着距离增加,这些互动的可能性怎样降低。
-
热点分析:在热点分析中,可以用衰减函数来确定某一事件(如犯罪、病例报告等)对周围区域的影响,随距离递减。
-
可达性评估:在评估某个地点对于居民的可达性时,可以使用衰减函数来模拟不同交通模式(步行、开车等)的时间或距离衰减。
-
地理加权回归(Geographically Weighted Regression, GWR):在地理加权回归分析中,衰减函数用于赋予数据点一个权重,这个权重基于数据点之间的空间距离,更近的点有更大的影响力。
在具体应用中,选择合适的衰减函数类型和参数对模型结果的精确性有很大影响。常见的衰减函数形式包括:
- 指数衰减函数:f(d) = e^(-λd),其中d是距离,λ是衰减系数。
- 幂律衰减函数:f(d) = d^(-β),其中d是距离,β是衰减系数。
这些函数的参数通常需要根据实际数据进行拟合和调整,以最好地反映现实世界中的衰减现象。
elasticsearch 提供gauss、lin、exp 衰减函数,对比如下: