Spring Data Solr入门

Spring Data Solr是Spring Data项目的扩展，该项目旨在简化Apache Solr在Spring应用程序中的使用。请注意，这不是Spring（数据）或Solr的简介。我认为您至少对这两种技术都有一些基本的了解。在下面的文章中，我将展示如何使用Spring Data存储库访问Spring应用程序中的Solr功能。

组态

首先，我们需要一个正在运行的Solr服务器。为简单起见，我们将使用当前Solr版本（在撰写本文时为4.5.1）随附的示例配置，并在官方Solr教程中进行了描述。因此，我们只需要下载Solr，将其解压缩到我们选择的目录中，然后从<solr home> / example目录运行java -jar start.jar。

现在，让我们转到演示应用程序，并使用maven添加Spring Data Solr依赖项：

<dependency><groupId>org.springframework.data</groupId><artifactId>spring-data-solr</artifactId><version>1.0.0.RELEASE</version>
</dependency>

在这个示例中，我使用Spring Boot设置了一个小的示例Spring应用程序。我为此使用了以下Spring Boot依赖项和Spring Boot父pom：

<parent><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-parent</artifactId><version>0.5.0.BUILD-SNAPSHOT</version>
</parent>

<dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter</artifactId>
</dependency>
<dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-test</artifactId><scope>test</scope>
</dependency>

如果您还没有使用过Spring Boot，请不要担心。这些依赖关系主要充当常见（Spring）依赖关系的捷径，并稍微简化了配置。如果要将Spring Data Solr集成到现有的Spring应用程序中，则可以跳过Spring Boot依赖项。

Spring bean的配置非常简单，我们只需要自己定义两个bean：

@ComponentScan
@EnableSolrRepositories("com.mscharhag.solr.repository")
public class Application {@Beanpublic SolrServer solrServer() {return new HttpSolrServer("http://localhost:8983/solr");}@Beanpublic SolrTemplate solrTemplate(SolrServer server) throws Exception {return new SolrTemplate(server);}
}

solrServer bean用于连接到正在运行的Solr实例。由于Spring Data Solr使用Solrj，因此我们创建了Solrj HttpSolrServer实例。通过使用EmbeddedSolrServer也可以使用嵌入式Solr服务器。 SolrTemplate提供了与Solr一起使用的通用功能（类似于Spring的JdbcTemplate）。创建Solr存储库需要使用solrTemplate bean。另请注意@EnableSolrRepositories批注。有了这个注释，我们告诉Spring Data Solr在指定的包中查找Solr存储库。

建立文件

在查询Solr之前，我们必须将文档添加到索引。要定义文档，我们创建一个POJO并向其中添加Solrj批注。在此示例中，我们将使用一个简单的Book类作为文档：

public class Book {@Fieldprivate String id;@Fieldprivate String name;@Fieldprivate String description;@Field("categories_txt")private List<Category> categories;// getters/setters
}

public enum Category {EDUCATION, HISTORY, HUMOR, TECHNOLOGY, ROMANCE, ADVENTURE
}

每本书都有唯一的ID，名称，说明，并属于一个或多个类别。请注意，默认情况下，Solr需要每个文档的String类型的唯一ID。应该添加到Solr索引的字段使用Solrj @Field注释进行注释。默认情况下，Solrj尝试将文档字段名称映射到同名的Solr字段。 Solr示例配置已经定义了名为id，名称和描述的Solr字段，因此不必将这些字段添加到Solr配置中。

如果要更改Solr字段定义，可以在<solr home> /example/solr/collection1/conf/schema.xml中找到示例配置文件。在此文件中，您应该找到以下字段定义：

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
<field name="name" type="text_general" indexed="true" stored="true" />
<field name="description" type="text_general" indexed="true" stored="true"/>

一般而言，书名比书名更好。但是，通过使用名称，我们可以使用默认的Solr字段配置。因此，出于简单原因，我选择名称而不是标题。

对于类别，我们必须使用@Field批注手动定义字段名称：Categories_txt。这与Solr示例中名为* _txt的动态字段匹配。也可以在schema.xml中找到此字段定义：

<dynamicField name="*_txt" type="text_general"   indexed="true"  stored="true" multiValued="true"/>

创建一个仓库

Spring Data使用存储库来简化各种数据访问技术的使用。仓库基本上是一个接口，其实现由Spring Data在应用程序启动时动态生成。生成的实现基于存储库接口中使用的命名约定。如果这是您的新手，建议阅读使用Spring数据存储库。

Spring Data Solr使用相同的方法。我们在接口内部使用命名约定和注释来定义访问Solr功能所需的方法。我们从一个仅包含一种方法的简单存储库开始（稍后将添加更多方法）：

public interface BookRepository extends SolrCrudRepository<Book, String> {List<Book> findByName(String name);}

通过扩展SolrCrudRepository，我们可以在存储库中获得一些常用方法，如save（），findAll（），delete（）或count（）。使用接口方法findByName（String name）的定义，我们告诉Spring Data Solr创建一个方法实现，该方法实现向Solr查询书籍列表。此列表中的书名应与传递的参数匹配。

可以使用Spring的DI功能将存储库实现注入到其他类中。在此示例中，我们将存储库注入到简单的JUnit测试中：

@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(classes = Application.class, loader=SpringApplicationContextLoader.class)
public class BookRepositoryTests {@Autowiredprivate BookRepository bookRepository;...
}

向Solr添加文档

现在是时候向Solr添加一些书籍了。使用我们的存储库，这是一项非常简单的工作：

private void addBookToIndex(String name, String description, Category... categories) {Book book = new Book();book.setName(name);book.setDescription(description);book.setCategories(Arrays.asList(categories));book.setId(UUID.randomUUID().toString());bookRepository.save(book);
}private void createSampleData() {addBookToIndex("Treasure Island", "Best seller by R.L.S.", Category.ADVENTURE);addBookToIndex("The Pirate Island", "Oh noes, the pirates are coming!", Category.ADVENTURE, Category.HUMOR);...
}

增加分页并增强

假设我们有一个应用程序，用户可以在其中搜索书籍。我们需要查找名称或描述与用户给出的搜索查询相匹配的书籍。出于性能原因，我们希望添加某种分页功能，该分页功能一次只能向用户显示10个搜索结果。

让我们在存储库界面中为此创建一个新方法：

Page<Book> findByNameOrDescription(@Boost(2) String name, String description, Pageable pageable);

方法名称findByNameOrDescription告诉Spring Data Solr查询名称或描述与传递的参数匹配的书籍对象。为了支持分页，我们添加了Pageable参数，并将返回类型从List <Book>更改为Page <Book>。通过在名称参数中添加@Boost批注，可以增强名称与搜索参数匹配的书籍。这是有道理的，因为这些书通常对用户更感兴趣。

如果现在我们要查询包含10个元素的第一页，我们只需要做：

Page<Book> booksPage = bookRepository.findByNameOrDescription
(searchString, searchString, new PageRequest(0, 10));

除前10个搜索结果外，Page <Book>提供了一些用于建立分页功能的有用方法：

booksPage.getContent()       // get a list of (max) 10 books
booksPage.getTotalElements() // total number of elements (can be >10)
booksPage.getTotalPages()    // total number of pages
booksPage.getNumber()        // current page number
booksPage.isFirstPage()      // true if this is the first page
booksPage.hasNextPage()      // true if another page is available
booksPage.nextPageable()     // the pageable for requesting the next page
...

刻面

每当用户搜索书名时，我们都想向他显示不同类别中有多少本书符合给定的查询参数。此功能称为分面搜索，Spring Data Solr直接支持此功能。我们只需要向存储库接口添加另一种方法：

@Query("name:?0")
@Facet(fields = { "categories_txt" }, limit = 5)
FacetPage<Book> findByNameAndFacetOnCategories(String name, Pageable page);

这次查询将从@Query批注（包含Solr查询）而不是方法名称派生。使用@Facet批注，我们告诉Spring Data Solr按类别对构面进行分类，并返回前五个构面。

也可以删除@Query批注并将方法名称更改为findByName，以达到相同的效果。这种方法的一个小缺点是，对于调用者来说，这种存储库方法确实可以执行构面操作，这一点并不明显。另外，方法签名可能与其他按名称搜索书籍的方法相冲突。

用法：

FacetPage<Book> booksFacetPage = bookRepository.findByNameAndFacetOnCategories(bookName, new PageRequest(0, 10));booksFacetPage.getContent(); // the first 10 booksfor (Page<? extends FacetEntry> page : booksFacetPage.getAllFacets()) {for (FacetEntry facetEntry : page.getContent()) {String categoryName = facetEntry.getValue();  // name of the categorylong count = facetEntry.getValueCount();      // number of books in this category// convert the category name back to an enumCategory category = Category.valueOf(categoryName.toUpperCase());}
}

请注意，booksFacetPage.getAllFacets（）返回FacetEntry页面的集合。这是因为@Facet批注允许您一次对多个字段进行构面。每个FacetPage最多包含五个FacetEntries（由@Facet的limit属性定义）。

突出显示

通常，在搜索结果列表中突出显示搜索查询的出现很有用（例如，由google或bing来完成）。这可以通过（Spring Data）Solr的突出显示功能来实现。

让我们添加另一个存储库方法：

@Highlight(prefix = "<highlight>", postfix = "</highlight>")
HighlightPage<Book> findByDescription(String description, Pageable pageable);

@Highlight注释告诉Solr突出显示搜索到的描述的出现。

用法：

HighlightPage<Book> booksHighlightPage = bookRepository.findByDescription(description, new PageRequest(0, 10));booksHighlightPage.getContent(); // first 10 booksfor (HighlightEntry<Book> he : booksHighlightPage.getHighlighted()) {// A HighlightEntry belongs to an Entity (Book) and may have multiple highlighted fields (description)for (Highlight highlight : he.getHighlights()) {// Each highlight might have multiple occurrences within the descriptionfor (String snipplet : highlight.getSnipplets()) {// snipplet contains the highlighted text}}
}

如果使用此存储库方法查询描述包含字符串“ 金银岛”的书籍，则摘要可能如下所示：

<highlight>Treasure Island</highlight> is a tale of pirates and villains, maps, treasure and shipwreck, and is perhaps one of the best adventure story ever written.

在这种情况下， 金银岛位于说明的开头，并使用@Highlight注释中定义的前缀和后缀突出显示。当向用户显示搜索结果时，此附加标记可用于标记查询的出现。