要为社区APP的帖子提供全文搜索的功能,考察使用ElasticSearch实现此功能。
ES的安装不再描述。
- es集成中文分词器(根据es版本选择对应的插件版本)
下载源码:https://github.com/medcl/elasticsearch-analysis-ik
maven编译得到:elasticsearch-analysis-ik-1.9.5.zip
在plugins目录下创建ik目录,将elasticsearch-analysis-ik-1.9.5.zip解压在此目录。
- 创建索引(settings,mapping)
配置
{"settings":{"number_of_shards":5,"number_of_replicas":1},"mappings":{"post":{"dynamic":"strict","properties":{"id":{"type":"integer","store":"yes"},"title":{"type":"string","store":"yes","index":"analyzed","analyzer": "ik_max_word","search_analyzer": "ik_max_word"},"content":{"type":"string","store":"yes","index":"analyzed","analyzer": "ik_max_word","search_analyzer": "ik_max_word"},"author":{"type":"string","store":"yes","index":"no"},"time":{"type":"date","store":"yes","index":"no"}}}} }
执行命令,创建索引
curl -XPOST 'spark2:9200/community' -d @post.json
- 插入数据
工程代码依赖的jar包
pom.xml
<dependency><groupId>org.elasticsearch</groupId><artifactId>elasticsearch</artifactId><version>2.3.3</version>
</dependency>
<dependency><groupId>com.alibaba</groupId><artifactId>fastjson</artifactId><version>1.2.7</version>
</dependency>
ES client工具类
public class EsClient {private static TransportClient transportClient;static {Settings settings = Settings.builder().put("cluster.name", "es_cluster").build();try {transportClient = new TransportClient.Builder().settings(settings).build().addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("spark2"), 9300)).addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("spark3"), 9300));} catch (UnknownHostException e) {throw new RuntimeException(e);}}public static TransportClient getInstance() {return transportClient;} }
插入数据
TransportClient client = EsClient.getInstance();for (int i = 0; i < 10000; i++) {Post post = new Post(i + "", "hll", "百度百科", "ES即etamsports ,全名上海英模特制衣有限公司,是法国Etam集团在中国的分支企业,创立于1994年底。ES的服装适合出游、朋友聚会、晚间娱乐、校园生活等各种轻松", new Date());client.prepareIndex("community", "post", post.getId()).setSource(JSON.toJSONString(post)).execute().actionGet();}
- 查询,高亮
TransportClient client = EsClient.getInstance();SearchResponse response = client.prepareSearch("community").setTypes("post").setSearchType(SearchType.DFS_QUERY_THEN_FETCH).setQuery(QueryBuilders.multiMatchQuery("上海", "title", "content")) .setFrom(0).setSize(10).addHighlightedField("content").setHighlighterPreTags("<red>").setHighlighterPostTags("</red>").execute().actionGet();SearchHits hits = response.getHits();for (SearchHit hit : hits) {String s = "";System.out.println(hit.getHighlightFields());for (Text text : hit.highlightFields().get("content").getFragments()) {s += text.string();}Map<String, Object> source = hit.getSource();source.put("content", s);System.out.println(source);}
查询结果
{author=hll, id=782, time=1490165237878, title=百度百科, content=ES即etamsports ,全名<red>上海</red>英模特制衣有限公司,是法国Etam集团在中国的分支企业,创立于1994年底。ES的服装适合出游、朋友聚会、晚间娱乐、校园生活等各种轻松}