【Lucene4.8教程之中的一个】使用Lucene4.8进行索引及搜索的基本操作

在Lucene对文本进行处理的过程中，能够大致分为三大部分：

1、索引文件：提取文档内容并分析，生成索引

2、搜索内容：搜索索引内容，依据搜索keyword得出搜索结果

3、分析内容：对搜索词汇进行分析，生成Quey对象。

注：其实。除了最主要的全然匹配搜索以外。其他都须要在搜索前进行分析。

如不加分析步骤。则搜索JAVA。是没有结果的，由于在索引过程中已经将词汇均转化为小写。而此处搜索时则要求keyword全然匹配。

使用了QueryParser类以后，则依据Analyzer的详细实现类，对搜索词汇进行分析，如大写和小写转换，java and ant等的搜索词解释等。

一、索引文件

基本过程例如以下：

1、创建索引库IndexWriter

2、依据文件创建文档Document

3、向索引库中写入文档内容

package com.ljh.search.index;import java.io.File;
import java.io.FileReader;
import java.io.IOException;import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.LongField;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;// 1、创建索引库IndexWriter
// 2、依据文件创建文档Document
// 3、向索引库中写入文档内容public class IndexFiles {public static void main(String[] args) throws IOException {String usage = "java IndexFiles"+ " [-index INDEX_PATH] [-docs DOCS_PATH] \n\n"+ "This indexes the documents in DOCS_PATH, creating a Lucene index"+ "in INDEX_PATH that can be searched with SearchFiles";String indexPath = null;String docsPath = null;for (int i = 0; i < args.length; i++) {if ("-index".equals(args[i])) {indexPath = args[i + 1];i++;} else if ("-docs".equals(args[i])) {docsPath = args[i + 1];i++;}}if (docsPath == null) {System.err.println("Usage: " + usage);System.exit(1);}final File docDir = new File(docsPath);if (!docDir.exists() || !docDir.canRead()) {System.out.println("Document directory '"+ docDir.getAbsolutePath()+ "' does not exist or is not readable, please check the path");System.exit(1);}IndexWriter writer = null;try {// 1、创建索引库IndexWriterwriter = getIndexWriter(indexPath);index(writer, docDir);} catch (IOException e) {e.printStackTrace();} finally {writer.close();}}private static IndexWriter getIndexWriter(String indexPath)throws IOException {Directory indexDir = FSDirectory.open(new File(indexPath));IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_48,new StandardAnalyzer(Version.LUCENE_48));IndexWriter writer = new IndexWriter(indexDir, iwc);return writer;}private static void index(IndexWriter writer, File file) throws IOException {if (file.isDirectory()) {String[] files = file.list();if (files != null) {for (int i = 0; i < files.length; i++) {index(writer, new File(file, files[i]));}}} else {// 2、依据文件创建文档DocumentDocument doc = new Document();Field pathField = new StringField("path", file.getPath(),Field.Store.YES);doc.add(pathField);doc.add(new LongField("modified", file.lastModified(),Field.Store.NO));doc.add(new TextField("contents", new FileReader(file)));System.out.println("Indexing " + file.getName());// 3、向索引库中写入文档内容writer.addDocument(doc);}}}

（1）使用“java indexfiles -index d:/index -docs d:/tmp”执行程序，索引d:/tmp中的文件。并将索引文件放置到d:/index。

（2）上述生成的索引文件能够使用Luke进行查看。眼下Luke已迁移至github进行托管。

二、搜索文件

1、打开索引库IndexSearcher
2、依据关键词进行搜索
3、遍历结果并处理

package com.ljh.search.search;//1、打开索引库IndexSearcher
//2、依据关键词进行搜索
//3、遍历结果并处理
import java.io.File;
import java.io.IOException;import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;public class Searcher {public static void main(String[] args) throws IOException {String indexPath = null;String term = null;for (int i = 0; i < args.length; i++) {if ("-index".equals(args[i])) {indexPath = args[i + 1];i++;} else if ("-term".equals(args[i])) {term = args[i + 1];i++;}}System.out.println("Searching " + term + " in " + indexPath);// 1、打开索引库Directory indexDir = FSDirectory.open(new File(indexPath));IndexReader ir = DirectoryReader.open(indexDir);IndexSearcher searcher = new IndexSearcher(ir);// 2、依据关键词进行搜索TopDocs docs = searcher.search(new TermQuery(new Term("contents", term)), 20);// 3、遍历结果并处理ScoreDoc[] hits = docs.scoreDocs;System.out.println(hits.length);for (ScoreDoc hit : hits) {System.out.println("doc: " + hit.doc + " score: " + hit.score);}ir.close();}}

三、分析

其实。除了最主要的全然匹配搜索以外，其他都须要在搜索前进行分析。

如不加分析步骤。则搜索JAVA。是没有结果的，由于在索引过程中已经将词汇均转化为小写。而此处搜索时则要求keyword全然匹配。

使用了QueryParser类以后，则依据Analyzer的详细实现类，对搜索词汇进行分析，如大写和小写转换，java and ant等的搜索词解释等。

分析过程有2个基本步骤：

1、生成QueryParser对象

2、调用QueryParser.parse()生成Query()对象。

详细代码，将下述代码：

		// 2、依据关键词进行搜索TopDocs docs = searcher.search(new TermQuery(new Term("contents", term)), 20);

用下面取代：

		// 2、依据关键词进行搜索/*TopDocs docs = searcher.search(new TermQuery(new Term("contents", term)), 10);*/QueryParser parser = new QueryParser(Version.LUCENE_48, "contents", new SimpleAnalyzer(Version.LUCENE_48));Query query = null;try {query = parser.parse(term);} catch (ParseException e) {e.printStackTrace();}TopDocs docs = searcher.search(query, 30);

转载于:https://www.cnblogs.com/mqxnongmin/p/10472754.html

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/264655.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！