可搜索的文件？是的你可以。选择AsciiDoc的另一个原因

Elasticsearch是一个基于Apache Lucene的灵活，功能强大的开源，分布式实时云搜索和分析引擎，可提供全文搜索功能。它是面向文档且无架构的。

Asciidoctor是一个纯Ruby处理器，用于将AsciiDoc源文件和字符串转换为HTML 5 ， DocBook 4.5和其他格式。除了Asciidoctor Ruby部分之外，还有一个Asciidoctor-java-integration项目，该项目使我们可以从Java调用Asciidoctor函数，而无需注意正在执行Ruby代码。

在这篇文章中，我们将了解如何在AsciiDoc文档上使用Elasticsearch ，以使其可通过其标题信息或内容进行搜索。

让我们添加所需的依赖项：

<dependencies><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>4.11</version><scope>test</scope></dependency><dependency><groupId>com.googlecode.lambdaj</groupId><artifactId>lambdaj</artifactId><version>2.3.3</version></dependency><dependency><groupId>org.elasticsearch</groupId><artifactId>elasticsearch</artifactId><version>0.90.1</version></dependency><dependency><groupId>org.asciidoctor</groupId><artifactId>asciidoctor-java-integration</artifactId><version>0.1.3</version></dependency></dependencies>

Lambdaj库用于将AsciiDoc文件转换为json文档。

现在我们可以启动一个Elasticsearch实例，在我们的例子中，它将是一个嵌入式实例。

node = nodeBuilder().local(true).node();

下一步是解析AsciiDoc文档标题，读取其内容并将其转换为json文档。

存储在Elasticsearch中的json文档的示例可以是：

{"title":"Asciidoctor Maven plugin 0.1.2 released!","authors":[{"author":"Jason Porter","email":"example@mail.com"}],"version":null,"content":"= Asciidoctor Maven plugin 0.1.2 released!.....","tags":["release","plugin"]
}

而对于一个AsciiDoc文件转换成JSON文件，我们将使用由Elasticsearch 的Java API提供了以编程方式创建JSON文件XContentBuilder类。

package com.lordofthejars.asciidoctor;import static org.elasticsearch.common.xcontent.XContentFactory.*;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.List;import org.asciidoctor.Asciidoctor;
import org.asciidoctor.Author;
import org.asciidoctor.DocumentHeader;
import org.asciidoctor.internal.IOUtils;
import org.elasticsearch.common.xcontent.XContentBuilder;import ch.lambdaj.function.convert.Converter;public class AsciidoctorFileJsonConverter implements Converter<File, XContentBuilder> {private Asciidoctor asciidoctor;public AsciidoctorFileJsonConverter() {this.asciidoctor = Asciidoctor.Factory.create();}public XContentBuilder convert(File asciidoctor) {DocumentHeader documentHeader = this.asciidoctor.readDocumentHeader(asciidoctor);XContentBuilder jsonContent = null;try {jsonContent = jsonBuilder().startObject().field("title", documentHeader.getDocumentTitle()).startArray("authors");Author mainAuthor = documentHeader.getAuthor();jsonContent.startObject().field("author", mainAuthor.getFullName()).field("email", mainAuthor.getEmail()).endObject();List<Author> authors = documentHeader.getAuthors();for (Author author : authors) {jsonContent.startObject().field("author", author.getFullName()).field("email", author.getEmail()).endObject();}jsonContent.endArray().field("version", documentHeader.getRevisionInfo().getNumber()).field("content", readContent(asciidoctor)).array("tags", parseTags((String)documentHeader.getAttributes().get("tags"))).endObject();} catch (IOException e) {throw new IllegalArgumentException(e);}return jsonContent;}private String[] parseTags(String tags) {tags = tags.substring(1, tags.length()-1);return tags.split(", ");}private String readContent(File content) throws FileNotFoundException {return IOUtils.readFull(new FileInputStream(content));}}

基本上，我们通过调用startObject方法来启动新对象， field方法来添加新字段以及startArray来启动数组来构建json文档。然后，将使用此生成器以json格式呈现等效对象。请注意，我们使用的是从Asciidoctor类返回头从AsciiDoc文件属性没有阅读和渲染整个文档readDocumentHeader方法。最后，内容字段设置为所有文档内容。

现在我们准备开始为文档建立索引。注意populateData方法接收一个Client对象作为参数。该对象来自Elasticsearch Java API ，表示与Elasticsearch数据库的连接。

import static ch.lambdaj.Lambda.convert;
//....private void populateData(Client client) throws IOException {List<File> asciidoctorFiles = new ArrayList<File>() {{add(new File("target/test-classes/java_release.adoc"));add(new File("target/test-classes/maven_release.adoc"));}};List<XContentBuilder> jsonDocuments = convertAsciidoctorFilesToJson(asciidoctorFiles);for (int i=0; i < jsonDocuments.size(); i++) {client.prepareIndex("docs", "asciidoctor", Integer.toString(i)).setSource(jsonDocuments.get(i)).execute().actionGet();}client.admin().indices().refresh(new RefreshRequest("docs")).actionGet();
}private List<XContentBuilder> convertAsciidoctorFilesToJson(List<File> asciidoctorFiles) {return convert(asciidoctorFiles, new AsciidoctorFileJsonConverter());
}

重要的是要注意，算法的第一部分是通过使用先前的转换器类和Lambdaj项目的convert方法将所有我们的AsciiDoc文件（在本例中为两个）转换为XContentBuilder实例。

如果您愿意，可以在https://github.com/asciidoctor/asciidoctor.github.com/blob/develop/news/asciidoctor-java-integration-0-1-3-中查看本示例中使用的两个文档。 release.adoc和https://github.com/asciidoctor/asciidoctor.github.com/blob/develop/news/asciidoctor-maven-plugin-0-1-2-released.adoc 。

下一部分是在一个索引中插入文档。这是通过使用prepareIndex方法完成的，该方法需要一个索引名称（ docs ），一个索引类型（ asciidoctor ）和要插入的文档的ID 。然后我们调用setSource方法，该方法将XContentBuilder对象转换为json ，最后通过调用execute（）。actionGet（） ，将数据发送到数据库。

仅由于我们使用的是Elasticsearch的嵌入式实例（在生产中不需要此部分），才需要最后一步，该实例通过调用refresh方法刷新索引。

之后，我们可以开始查询Elasticsearch以从我们的AsciiDoc文档中检索信息。

让我们从一个非常简单的示例开始，该示例返回所有插入的文档：

SearchResponse response = client.prepareSearch().execute().actionGet();

接下来，我们将搜索Alex Soto编写的所有文档，在本例中是其中一个。

import static org.elasticsearch.index.query.QueryBuilders.matchQuery;
//....
QueryBuilder matchQuery =  matchQuery("author", "Alex Soto");QueryBuilder matchQuery =  matchQuery("author", "Alexander Soto");

请注意，我正在搜索字段作者字符串Alex Soto ，该字符串仅返回一个。另一个文档由Jason编写。但是有趣的是，如果您搜索Alexander Soto ，那么将返回相同的文档。 Elasticsearch非常聪明，可以知道Alex和Alexander是非常相似的名字，因此它也返回了文档。

更多查询，如何查找由Alex而不是Soto撰写的文档。

import static org.elasticsearch.index.query.QueryBuilders.fieldQuery;//....QueryBuilder matchQuery =  fieldQuery("author", "+Alex -Soto");

在这种情况下，当然不会返回任何结果。请注意，在这种情况下，我们使用字段查询而不是术语查询，并且使用+和–符号来排除和包括单词。

您也可以找到所有包含title上 释放的单词的文档。

import static org.elasticsearch.index.query.QueryBuilders.matchQuery;//....QueryBuilder matchQuery =  matchQuery("title", "released");

最后，让我们找到所有谈论0.1.2版本的文档，在这种情况下，只有一个文档谈论它，另一个文档谈论0.1.3。

QueryBuilder matchQuery = matchQuery("content", "0.1.2");

现在我们只需要将查询发送到Elasticsearch数据库即可，这是通过使用prepareSearch方法完成的。

SearchResponse response = client.prepareSearch("docs").setTypes("asciidoctor").setQuery(matchQuery).execute().actionGet();SearchHits hits = response.getHits();for (SearchHit searchHit : hits) {System.out.println(searchHit.getSource().get("content"));
}

请注意，在这种情况下，我们通过控制台打印AsciiDoc内容，但是您可以使用asciidoctor.render（String content，Options options）方法将内容呈现为所需格式。

因此，在本文中，我们看到了如何使用Elasticsearch为文档建立索引，如何使用Asciidoctor-java-integration项目从AsciiDoc文件中获取一些重要信息，以及最后如何对插入的文档执行一些查询。当然， Elasticsearch中还有更多种查询，但是本文的目的并不是要探索Elasticsearch的所有可能性。

另外，请注意使用AsciiDoc格式编写文档有多么重要。您无需花费太多精力就可以为文档建立搜索引擎。另一方面，请设想使用任何专有的二进制格式（例如Microsoft Word）来实现相同的所有代码。因此，我们已经说明了使用AsciiDoc而不是其他格式的另一个原因。

参考： 可搜索的文件？是的你可以。从我们的JCG合作伙伴 Alex Soto 选择 “ One Jar To Rule All All”博客的另一个原因。