ES常用查询以及使用Java Api Client进行检索
1. 检索需求
参照豆瓣阅读的列表页面
需求:
- 检索词需要在数据库中的题名、作者和摘要字段进行检索并进行高亮标红
- 返回的检索结果需要根据综合、热度最高、最近更新、销量最高、好评最多进行排序
- 分页数量为10,并且返回检索到的总数量
2. 建立测试环境
2.1 根据需求建立es字段
mapping.json
{"mappings": {"properties": {"title": {"analyzer": "standard","type": "text"},"author": {"analyzer": "standard","type": "text","fields": {"keyword": {"type": "keyword"}}},"contentDesc": {"analyzer": "standard","type": "text"},"wordCount": {"type": "double"},"price": {"type": "double"},"cover": {"type": "keyword"},"heatCount": {"type": "integer"},"updateTime": {"type": "date"}}}
}
映射字段说明:
- id(长整型): 表示唯一标识的字段,类型为
long
- title(文本类型): 用于存储文档标题的字段,类型为
text
。指定默认的标准分析器(analyzer)为standard
- author(文本类型): 存储文档作者的字段,同样是
text
类型。除了使用标准分析器外,还定义额外的关键字(keyword)字段,该关键字字段通常用于==精确匹配和聚合==操作。 - contentDesc(文本类型): 存储文档内容描述的字段,同样是
text
类型,使用标准分析器。 - wordCount(双精度浮点型): 存储文档字数的字段,类型为
double
。通常用于存储浮点数值。 - price(双精度浮点型): 存储文档价格的字段,同样是
double
类型。用于存储浮点数值,例如书籍的价格。 - cover(关键字类型): 存储文档封面的字段,类型为
keyword
。关键字字段通常用于精确匹配。 - heatCount(整型): 存储热度计数的字段,类型为
integer
。通常用于热度排序 - updateTime(日期类型): 存储文档更新时间的字段,类型为
date
。用于最近更新排序
2.2 创建索引和映射
2.3 增加测试数据
POST /douban/_doc/1001{"title":"诗云","author":"刘慈欣","contentDesc":"伊依一行三人乘坐一艘游艇在南太平洋上做吟诗航行,平时难得一见的美洲大陆清晰地显示在天空中,在东半球构成的覆盖世界的巨大穹顶上,大陆好像是墙皮脱落的区域…","wordCount":18707,"price":6.99,"cover":"https://pic.arkread.com/cover/ebook/f/19534800.1653698501.jpg!cover_default.jpg","heatCount":201,"updateTime":"2023-12-20"}POST /douban/_doc/1002{"title":"三体2·黑暗森林","author":"刘慈欣","contentDesc":"征服世界的中国科幻神作!包揽九项世界顶级科幻大奖!《三体》获得第73届“雨果奖”最佳长篇奖!","wordCount":318901,"price":32.00,"cover":"https://pic.arkread.com/cover/ebook/f/110344476.1653700299.jpg!cover_default.jpg","heatCount":545,"updateTime":"2023-12-25"}POST /douban/_doc/1003{"title":"三体前传:球状闪电","author":"刘慈欣","contentDesc":"征服世界的中国科幻神作!包揽九项世界顶级科幻大奖!《三体》获得第73届“雨果奖”最佳长篇奖!","wordCount":181119,"price":35.00,"cover":"https://pic.arkread.com/cover/ebook/f/116984494.1653699856.jpg!cover_default.jpg","heatCount":765,"updateTime":"2022-11-12"}POST /douban/_doc/1004{"title":"全频带阻塞干扰","author":"刘慈欣","contentDesc":"这是一个场面浩大而惨烈的故事。21世纪的某年,以美国为首的北约发起了对俄罗斯的全面攻击。在残酷的保卫战中,俄国的电子战设备无力抵挡美国的进攻","wordCount":28382,"price":6.99,"cover":"https://pic.arkread.com/cover/ebook/f/19532617.1653698557.jpg!cover_default.jpg","heatCount":153,"updateTime":"2021-03-23"}
3. 执行查询
3.1 主键查询
# 此种方式已过时,不推荐
GET /douban/_doc/1001# 推荐此种方式
POST /douban/_search
{"query": {"match": {"_id": 1001}}
}
3.2 全量查询
POST /douban/_search
{"query": {"match_all": { }}
}
3.3 分页查询
POST /douban/_search
{"query": {"match_all": { }},"from":1,"size":2
}
3.4 排序查询
POST /douban/_search
{"query": {"match_all": {}},"sort": [{"price": { "order": "desc" }}]
}
3.5 全文检索
POST /douban/_search
{"query": {"match": {"title":"三体球闪"}}
}
检索结果:
3.6 高亮检索
POST /douban/_search
{"query": {"match": {"title": "三体球闪"}},"highlight": {"fields": {"title": {"pre_tags": ["<font style='red'>"],"post_tags": ["</font>"]}}}
}
3.7 bool查询
题名进行全文检索包含‘三体球闪’,并且价格为‘35’的数据
POST /douban/_search
{"query": {"bool": {"must": [{"match": {"title": "三体球闪"}},{"term": {"price": 35}}]}}
}
3.7 多字段全文检索
对题名、作者、摘要进行全文匹配,同时根据三个字段进行高亮标红
POST /douban/_search
{"query": {"multi_match": {"query": "三体球闪","fields": ["title","author","contentDesc"]}},"highlight": {"fields": {"title": {},"author": {},"contentDesc": {}}}
}
3.8 综合检索
对题名、作者、摘要进行全文匹配,同时根据三个字段进行高亮标红
增加分页条件查询、增加更新日期降序排序、同时返回需要的必备字段
POST /douban/_search
{"query": {"multi_match": {"query": "三体球闪","fields": ["title","author","contentDesc"]}},"from": 0,"size": 2,"_source": ["title","author","price","wordCount"],"sort": [{"updateTime": {"order": "desc"}}],"highlight": {"fields": {"title": {},"author": {},"contentDesc": {}}}
}
4. Spring项目集成elasticsearch
参考文档:[Installation | Elasticsearch Java API Client 7.17] | Elastic
4.1 创建Spring项目并引入es依赖
如果希望使用java8,就打开pom.xml修改parent版本和java.version的值,然后点击刷新maven
在Elasticsearch7.15版本之后,Elasticsearch官方将它的高级客户端RestHighLevelClient标记为弃用状态。同时推出了全新的Java API客户端Elasticsearch Java API Client,该客户端也将在Elasticsearch8.0及以后版本中成为官方推荐使用的客户端。
Api名称 | 介绍 |
---|---|
基于TCP方式访问,只支持JAVA,7.x开始弃用,8.x删除. | |
Rest Lower Level Rest Client | 低等级RestApi,最小依赖。 |
高等级的RestApi,基于低等级Api,7.15开始弃用,但没有说明会删除。用低等级Api替换。 | |
RestClient | 基于Http的Api形式,跨语言,推荐使用,底层基于低等级Api,7.15才开始提供 |
<dependency><groupId>co.elastic.clients</groupId><artifactId>elasticsearch-java</artifactId><version>7.17.11</version>
</dependency><dependency><groupId>com.fasterxml.jackson.core</groupId><artifactId>jackson-databind</artifactId><version>2.12.3</version>
</dependency><!-- 此依赖的作用是解决:lassNotFoundException: jakarta.json.spi.JsonProvider参考:https://github.com/elastic/elasticsearch-java/issues/311 -->
<dependency><groupId>jakarta.json</groupId><artifactId>jakarta.json-api</artifactId><version>2.0.1</version>
</dependency>
完整依赖如下:注意 properties中一定要加 <elasticsearch.version>7.17.11</elasticsearch.version>,否则会导致无法覆盖父引用中依赖
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><parent><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-parent</artifactId><version>2.5.15</version><relativePath/></parent><groupId>com.zhouquan</groupId><artifactId>client</artifactId><version>0.0.1-SNAPSHOT</version><name>client</name><description>Demo project for Spring Boot</description><properties><java.version>8</java.version><lombok.version>1.18.22</lombok.version><elasticsearch.version>7.17.11</elasticsearch.version><jakarta.version>2.0.1</jakarta.version></properties><dependencies><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-web</artifactId></dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-test</artifactId><scope>test</scope></dependency><dependency><groupId>co.elastic.clients</groupId><artifactId>elasticsearch-java</artifactId><version>7.17.11</version></dependency><dependency><groupId>com.fasterxml.jackson.core</groupId><artifactId>jackson-databind</artifactId><version>2.12.3</version></dependency><!-- 此依赖的作用是解决:lassNotFoundException: jakarta.json.spi.JsonProvider参考:https://github.com/elastic/elasticsearch-java/issues/311 --><dependency><groupId>org.glassfish</groupId><artifactId>jakarta.json</artifactId><version>${jakarta.version}</version></dependency><dependency><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId><version>${lombok.version}</version></dependency><!-- Apache Commons IO --><dependency><groupId>commons-io</groupId><artifactId>commons-io</artifactId><version>2.11.0</version></dependency></dependencies><build><plugins><plugin><groupId>org.springframework.boot</groupId><artifactId>spring-boot-maven-plugin</artifactId></plugin></plugins></build></project>
4.2 增加es客户端配置类
交给spring进行管理,使用时通过@Resource private ElasticsearchClient client; 注入即可使用
@Configuration
@Slf4j
public class EsClient {@Resourceprivate EsConfig esConfig;/*** Bean 定义,用于创建 ElasticsearchClient 实例。** @return 配置有 RestClient 和传输设置的 ElasticsearchClient 实例。*/@Beanpublic ElasticsearchClient elasticsearchClient() {// 使用 Elasticsearch 集群的主机和端口配置 RestClientList<String> clusterNodes = esConfig.getClusterNodes();HttpHost[] httpHosts = clusterNodes.stream().map(HttpHost::create).toArray(HttpHost[]::new);// Create the low-level clientRestClient restClient = RestClient.builder(httpHosts).build();// JSON 序列化ElasticsearchTransport transport = new RestClientTransport(restClient, new JacksonJsonpMapper());ElasticsearchClient client = new ElasticsearchClient(transport);// 打印连接信息log.info("Elasticsearch Client 连接节点信息:{}", Arrays.toString(httpHosts));return client;}}
4.3 使用 Java API Client 创建索引
参考链接:Using the Java API Client
/*** 创建索引*/
@Test
void createIndex() throws IOException {ClassLoader classLoader = ResourceLoader.class.getClassLoader();InputStream input = classLoader.getResourceAsStream("mapping/douban.json");CreateIndexRequest req = CreateIndexRequest.of(b -> b.index("douban_v1").withJson(input));boolean created = client.indices().create(req).acknowledged();log.info("是否创建成功:" + created);
}
4.4 保存文档
实体类 DouBan.java
package com.zhouquan.client.entity;import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;import java.util.Date;/*** @author ZhouQuan* @description todo* @date 2024-01-09 15:54**/
@Data
@AllArgsConstructor
@NoArgsConstructor
public class DouBan {private String id;private String title;private String author;private String contentDesc;private Integer wordCount;private Double price;private String cover;private Integer heatCount;private Date updateTime;
}
4.4.1 索引单个文档
public String indexSingleDoc() {IndexResponse indexResponse;DouBan douBan = new DouBan("1211", "河边的错误", "余华", "内容简介", 50000, 52.5, "封面1", 74, new Date());try {// 使用流式dsl保存indexResponse = client.index(i -> i.index(indexName).id(douBan.getId()).document(douBan));// 使用 Java API Client的静态of()方法IndexRequest<DouBan> objectIndexRequest = IndexRequest.of(i -> i.index(indexName).id(douBan.getId()).document(douBan));IndexResponse ofIndexResponse = client.index(objectIndexRequest);// 使用经典版本IndexRequest.Builder<DouBan> objectBuilder = new IndexRequest.Builder<>();objectBuilder.index(indexName);objectBuilder.id(douBan.getId());objectBuilder.document(douBan);IndexResponse classicIndexResponse = client.index(objectBuilder.build());// 异步保存asyncClient.index(i -> i.index("douban").id(douBan.getId()).document(douBan)).whenComplete((response, exception) -> {if (exception != null) {log.error("Failed to index", exception);} else {log.info("Indexed with version " + response.version());}});// 索引原始json数据IndexResponse response = null;try {String jsonData = " {\"id\":\"1741\",\"title\":\"三体\",\"author\":\"刘慈欣\",\"contentDesc\":\"内容简介\",\"wordCount\":50000,\"price\":52.5}";Reader input = new StringReader(jsonData);IndexRequest<JsonData> request = IndexRequest.of(i -> i.index("douban_v1").withJson(input));response = client.index(request);log.info("Indexed with version " + response.version());} catch (IOException e) {throw new RuntimeException(e);}} catch (IOException e) {throw new RuntimeException(e);}return Result.Created.equals(indexResponse.result()) + "";}
4.4.2 批量索引文档
/*** 批量保存** @throws IOException*/
@Test
void bulkSave() throws IOException {DouBan douBan1 = new DouBan("1002", "题名1", "余华", "内容简介", 50000, 52.5, "封面1", 74, new Date());DouBan douBan2 = new DouBan("1003", "题名2", "余华", "内容简介", 50000, 52.5, "封面1", 74, new Date());DouBan douBan3 = new DouBan("1004", "题名3", "余华", "内容简介", 50000, 52.5, "封面1", 74, new Date());List<DouBan> douBanList = new ArrayList<>();douBanList.add(douBan1);douBanList.add(douBan2);douBanList.add(douBan3);BulkRequest.Builder br = new BulkRequest.Builder();for (DouBan douBan : douBanList) {br.operations(op -> op.index(idx -> idx.index("products").id(douBan.getId()).document(douBan)));}BulkResponse result = client.bulk(br.build());if (result.errors()) {log.error("Bulk had errors");for (BulkResponseItem item : result.items()) {if (item.error() != null) {log.error(item.error().reason());}}}
}
4.4.3 原始数据批量索引文档
/*** 原始json数据批量保存** @throws IOException*/
@Test
void rawDataBulkSave() throws IOException {File logDir = new File("D:\\IdeaProjects\\client\\src\\main\\resources\\data");File[] logFiles = logDir.listFiles(file -> file.getName().matches("bulk*.*\\.json"));BulkRequest.Builder br = new BulkRequest.Builder();for (File file : logFiles) {FileInputStream input = new FileInputStream(file);BinaryData data = BinaryData.of(IOUtils.toByteArray(input), ContentType.APPLICATION_JSON);br.operations(op -> op.index(idx -> idx.index("douban_v1").document(data)));}BulkResponse result = client.bulk(br.build());if (result.errors()) {List<BulkResponseItem> items = result.items();items.forEach(x -> System.out.println(x.error()));}log.info("是否成功批量保存:" + !result.errors());
}
4.5 获取单个文档
// 根据id获取数据并装载为java对象
GetRequest getRequest = GetRequest.of(x -> x.index("douban_v1").id("1002"));
GetResponse<DouBan> douBanGetResponse = client.get(getRequest, DouBan.class);
DouBan source = douBanGetResponse.source();GetResponse<DouBan> response = client.get(g -> g.index(indexName).id(id),DouBan.class
);if (!response.found()) {throw new BusinessException("未获取到指定id的数据");
}DouBan douBan = response.source();
log.info("资料title: " + douBan.getTitle());
return douBan;
// 根据id获取原始JSON数据
GetResponse<ObjectNode> response1 = client.get(g -> g.index(indexName).id(id),ObjectNode.class
);if (response1.found()) {ObjectNode json = response1.source();String name = json.get("title").asText();log.info(" title " + name);
} else {log.info("data not found");
}
return null;
4.6 文档检索
4.6.1 普通的搜索查询
public List<DouBan> search(String searchText) {SearchResponse<DouBan> response = null;try {response = client.search(s -> s.index(indexName).query(q -> q.match(t -> t.field("title").query(searchText))),DouBan.class);} catch (IOException e) {throw new RuntimeException(e);}TotalHits total = response.hits().total();boolean isExactResult = total.relation() == TotalHitsRelation.Eq;if (isExactResult) {log.info("There are " + total.value() + " results");} else {log.info("There are more than " + total.value() + " results");}List<Hit<DouBan>> hits = response.hits().hits();List<DouBan> list = new ArrayList<>();for (Hit<DouBan> hit : hits) {DouBan DouBan = hit.source();list.add(DouBan);log.info("Found DouBan " + DouBan.getTitle() + ", score " + hit.score());}return list;
}
4.6.2 嵌套搜索查询
public List<DouBan> search2(String searchText, Double price) {Query titleQuery = MatchQuery.of(m -> m.field("title").query(searchText))._toQuery();Query rangeQuery = RangeQuery.of(r -> r.field("price").gte(JsonData.of(price)))._toQuery();try {SearchResponse<DouBan> search = client.search(s -> s.index(indexName).query(q -> q.bool(b -> b.must(titleQuery).must(rangeQuery))),DouBan.class);// 解析检索结果List<DouBan> douBanList = new ArrayList<>();List<Hit<DouBan>> hits = search.hits().hits();for (Hit<DouBan> hit : hits) {DouBan douBan = hit.source();douBanList.add(douBan);}return douBanList;} catch (Exception e) {throw new RuntimeException(e);}
}
4.6.3 模板搜索
// 创建模板,返回搜索请求正文的存储脚本
client.putScript(r -> r.id("query-script").script(s -> s.lang("mustache").source("{\"query\":{\"match\":{\"{{field}}\":\"{{value}}\"}}}")));// 执行请求
SearchTemplateResponse<DouBan> response = client.searchTemplate(r -> r.index("douban_v1").id("query-script").params("field", JsonData.of("title")).params("value", JsonData.of("题名")),DouBan.class
);// 结果解析
List<Hit<DouBan>> hits = response.hits().hits();
for (Hit<DouBan> hit: hits) {DouBan DouBan = hit.source();log.info("Found DouBan " + DouBan.getTitle() + ", score " + hit.score());
}
4.7 文档聚合
Query query = MatchQuery.of(t -> t.field("title").query(searchText))._toQuery();Aggregation authorAgg = AggregationBuilders.terms().field("author").build()._toAggregation();SearchResponse<DouBan> response = null;response = client.search(s -> s.index(indexName).query(query).aggregations("author", authorAgg),DouBan.class
);