前言:相信许多人都听说过.net开发过程中基于Lucene.net实现的全文索引,而Solr是一个高性能,基于Lucene的全文搜索服务器。同时对其进行了扩展,提供了比Lucene更为丰富的查询语言,同时实现了可配置、可扩展并对查询性能进行了优化,并且提供了一个完善的功能管理界面,是一款非常优秀的全文搜索引引擎,这里我就绕过Lucene,直接说Solr的应用了,总之,Solr比Lucene更加方便简洁好用,而且上手快,开发效率高。
Solr应用场景:涉及到大数据的全文搜索。尤其是电子商务平台还有现在流行的云计算,物联网等都是需要强大的数据量作为支撑的,使用Solr来进行数据检索最合适不过了,而且Solr是免费开源的,门槛低、投资少见效快。关于Solr的一些优点我这里就不在累赘陈述了,园子里也有很多大神也写了很多关于Solr的技术博文,我这里也只是抛砖引玉,见笑了。
好了,这里就开始Solr的奇幻之旅吧
基于.NET平台下的Solr开发步骤
一、搭建Solr服务器,具体步骤如下:
1.安装JDK,因为是.NET平台,不需要安装JRE、JAVA虚拟机,只安装JDK即可,而且安装JDK不需要手动去配置环境变量,它会自动帮我们配置好环境变量,很方便,这里我安装的是jdk1.7,官网地址:http://www.oracle.com/technetwork/java/javase/downloads/index.html
2.安装Tomcat8.0,官网地址:http://tomcat.apache.org/download-80.cgi,安装完成后启动Monitor Tomcat,浏览器地址栏输入http://localhost:8080/,能进入说明安装成功
3.下载Solr,这里我用的是Solr4.4版本,下载后进行下列配置
(1)解压Solr4.4,创建Solr目录,比如D:/SorlServer/one,将解压后的Solr4.4中的example目录下的Solr文件夹中的所有文件拷贝到创建的目录中
(2)创建Solr Web应用,具体步骤,将解压后的Solr4.4中的dist目录下的Solr-4.4.0.war文件拷贝到Tomcat下,比如C:\Program Files\Apache Software Foundation\Tomcat 7.0\webapps下,重命名为one.war,启动Tomcat后该文件会自动解压,进入到D:\SorlServer\one\collection1\conf下,打开solrconfig.xml文件,找到 <dataDir>节点改为<dataDir>${solr.data.dir:c:/SorlServer/one/data}</dataDir>
注意:这一步很重要:打开C:\Program Files\Apache Software Foundation\Tomcat 7.0\webapps\One\WEB-INF下的web.xml文件,找到<env-entry>节点开启,
将env-entry-value值改为D:/SorlServer/one,如下:
<env-entry>
<env-entry-name>solr/home</env-entry-name>
<env-entry-value>D:/SorlServer/one</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
</env-entry>
(3)将解压后的Solr4.4下的/dist/solrj-lib目录中的所有jar包拷贝到C:\Program Files\Apache Software Foundation\Tomcat 7.0\lib中
(4)停止Tomcat,然后再启动,访问http://localhost:8080/one,即可打开
注意:如果是开发英文网站,我们就不需要使用第三方的分词配置,Solr本身就内置支持英文分词,如果是其他语种比如小语种(日语、意大利、法语等等),大家可以去网上找相关的分词包,这里我们以中文分词为例,毕竟国内大部分网站都是中文为主的。
4.配置中文分词,国内常用的分词器(庖丁解牛、mmseg4j、IKAnalyzer),这里我用的是IKAnalyzer,这个分词器比较活跃而且更新也快,挺好用的,具体步骤如下:
(1)将IKAnalyzer的jar包以及IKAnalyzer.cfg.xml都复制到C:\Program Files\Apache Software Foundation\Tomcat 7.0\webapps\one\WEB-INF\lib下
(2)配置D:\SorlServer\one\collection1\conf下的schema.xml,添加如下配置:
<!-- 分词配置 -->
<fieldType name="text_IKFENCHI" class="solr.TextField">
<analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/>
</fieldType>
(3)停止Tomcat,然后再启动,访问http://localhost:8080/one/#/collection1/analysis,即可进行测试
以上是Solr服务器端的相关配置工作
二、开始基于.NET平台的Solr开发:
1.下载Solr客户端组件,我用的是园子里的Terry大哥的EasyNet.Solr,地址在微软开源站:http://easynet.codeplex.com/,
Terry大哥已经把solr客户端封装的很完善了,里面封装了很多现成的方法和参数配置,我们直接可以拿过来用,利用Easynet.solr创建索引,然后再查询索引,具体使用方法如下:
(1)下载EasyNet.Solr源码直接放到项目中,也可以将源码生成Dll组件后添加到项目引用进行使用,把源码放到项目中最好不过了,我们也可以对其进行调整来满足自己的需要
(2)创建索引实体类,就是我们要保存的索引数据,比如创建一个产品实体类
using System; using System.Collections.Generic;namespace Seek.SearchIndex {public partial class IndexProductModel{public IndexProductModel(){}#region Propertiespublic int ID { get; set; }public int ProductID { get; set; }public string ClassPath { get; set; }public int ClassID1 { get; set; }public int ClassID2 { get; set; }public int ClassID3 { get; set; }public string Title { get; set; }public string Model { get; set; }public string PriceRange { get; set; }public string AttributeValues { get; set; }public string ProductImages { get; set; }public int MemberID { get; set; }public System.DateTime CreateDate { get; set; }public System.DateTime LastEditDate { get; set; }public string FileName { get; set; }public string ProductType { get; set; }public string Summary { get; set; }public string Details { get; set; }public string RelatedKeywords { get; set; }public int MemberGrade { get; set; }#endregion} }
(3)配置Solr服务器端的xml,就是将咱们的这个索引实体类配置到Solr服务器上,进入D:\SorlServer\one\collection1\conf,打开schema.xml文件,配置如下
<field name="ID" type="string" indexed="true" stored="true" required="true" multiValued="false" /> <field name="ProductID" type="int" indexed="true" stored="true"/><!-- 快速高亮配置 termVectors="true" termPositions="true" termOffsets="true" --><field name="Title" type="text_en_splitting" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true"/><field name="Model" type="text_en_splitting" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true"/><field name="ClassPath" type="string" indexed="true" stored="true"/><field name="ClassID1" type="int" indexed="true" stored="true"/><field name="ClassID2" type="int" indexed="true" stored="true"/><field name="ClassID3" type="int" indexed="true" stored="true"/><field name="PriceRange" type="string" indexed="true" stored="true"/><field name="AttributeValues" type="string" indexed="true" stored="true"/><field name="ProductImages" type="string" indexed="true" stored="true"/><field name="MemberID" type="int" indexed="true" stored="true"/><field name="CreateDate" type="date" indexed="true" stored="true"/><field name="LastEditDate" type="date" indexed="true" stored="true"/><field name="FileName" type="string" indexed="true" stored="true"/><field name="ProductType" type="string" indexed="true" stored="true"/><field name="Summary" type="string" indexed="true" stored="false"/><field name="Details" type="string" indexed="true" stored="false"/><field name="RelatedKeywords" type="string" indexed="true" stored="true"/><field name="MemberType" type="string" indexed="true" stored="true"/><field name="MemberGrade" type="int" indexed="true" stored="true"/>
(4)开始创建索引,最好能写一个生成索引的客户端程序,我这里提供一下自己的索引器关键代码
using System; using System.Collections.Generic; using System.Linq; using System.Text; using Seek.SearchIndex; using System.Data; using System.Threading; using System.Configuration; using System.Reflection; using EasyNet.Solr; using EasyNet.Solr.Impl; using EasyNet.Solr.Commons; using System.Xml.Linq; using EasyNet.Solr.Commons.Params; using System.Threading.Tasks;namespace Seek.SearchIndex {/// <summary>/// 索引器/// </summary>public class Indexer{private readonly static OptimizeOptions optimizeOptions = new OptimizeOptions();private readonly static CommitOptions commitOptions = new CommitOptions() { SoftCommit = true };private readonly static ISolrResponseParser<NamedList, EasyNet.Solr.ResponseHeader> binaryResponseHeaderParser = new BinaryResponseHeaderParser();private readonly static IUpdateParametersConvert<NamedList> updateParametersConvert = new BinaryUpdateParametersConvert();private readonly static ISolrQueryConnection<NamedList> connection = new SolrQueryConnection<NamedList>() { ServerUrl = ConfigurationManager.AppSettings["SolrServer"] };private readonly static ISolrUpdateConnection<NamedList, NamedList> solrUpdateConnection = new SolrUpdateConnection<NamedList, NamedList>() { ServerUrl = ConfigurationManager.AppSettings["SolrServer"], ContentType = "application/javabin" };private readonly static ISolrUpdateOperations<NamedList> solr = new SolrUpdateOperations<NamedList, NamedList>(solrUpdateConnection, updateParametersConvert) { ResponseWriter = "javabin" };private readonly static ISolrQueryOperations<NamedList> solrQuery = new SolrQueryOperations<NamedList>(connection) { ResponseWriter = "javabin" };public enum State{/// <summary>/// 运行中/// </summary> Runing,/// <summary>/// 停止/// </summary> Stop,/// <summary>/// 中断/// </summary> Break}/// <summary>/// 窗口/// </summary>private Main form;/// <summary>/// 线程/// </summary>public Thread t;/// <summary>/// 消息状态/// </summary>public State state = State.Stop;/// <summary>/// 当前索引/// </summary>private long currentIndex = 0;public long CurrentIndex{get { return currentIndex; }set { currentIndex = value; }}private int _startId = AppCongfig.StartId;public int StartId{get { return _startId; }set { _startId = value; }}/// <summary>/// 产品总数/// </summary>private int productsCount = 0;/// <summary>/// 起始时间/// </summary>private DateTime startTime = DateTime.Now;/// <summary>/// 结束时间/// </summary>private DateTime endTime = DateTime.MinValue;private static object syncLock = new object();#region 单利模式private static Indexer instance = null;private Indexer(Main _form){form = _form;productsCount = DataAccess.GetCount(0); //产品数统计form.fullerTsslMaxNum.Text = productsCount.ToString();form.fullerProgressBar.Minimum = 0;form.fullerProgressBar.Maximum = productsCount;}public static Indexer GetInstance(Main form){if (instance == null){lock (syncLock){if (instance == null){instance = new Indexer(form);}}}return instance;}#endregion/// <summary>/// 启动/// </summary>public void Start(){ThreadStart ts = new ThreadStart(FullerRun);t = new Thread(ts);t.Start();}/// <summary>/// 停止/// </summary>public void Stop(){state = State.Stop;}/// <summary>/// 中断/// </summary>public void Break(){state = State.Break;}/// <summary>/// 创建索引/// </summary>public void InitIndex(object data){var docs = new List<SolrInputDocument>();DataTable list = data as DataTable;foreach (DataRow pro in list.Rows){var model = new SolrInputDocument();PropertyInfo[] properites = typeof(IndexProductModel).GetProperties();//得到实体类属性的集合string[] dateFields = { "CreateDate", "LastEditDate" };string field = string.Empty;//存储fieldnameforeach (PropertyInfo propertyInfo in properites)//遍历数组 {object val = pro[propertyInfo.Name];if (val != DBNull.Value){model.Add(propertyInfo.Name, new SolrInputField(propertyInfo.Name, val));}}docs.Add(model);StartId = Convert.ToInt32(pro["ID"]);}GetStartId();lock (syncLock){if (currentIndex <= productsCount){form.fullerProgressBar.Value = (int)currentIndex;}form.fullerTsslCurrentNum.Text = currentIndex.ToString();}var result = solr.Update("/update", new UpdateOptions() { Docs = docs });}/// <summary>/// 创建索引/// </summary>public void CreateIndexer(DataTable dt){GetStartId();Parallel.ForEach<DataRow>(dt.AsEnumerable(), (row) =>{//从数据库查询商品详细属性if (row != null){var docs = new List<SolrInputDocument>();var model = new SolrInputDocument();PropertyInfo[] properites = typeof(IndexProductModel).GetProperties();//得到实体类属性的集合string[] dateFields = { "CreateDate", "LastEditDate" };string field = string.Empty;//存储fieldnameforeach (PropertyInfo propertyInfo in properites)//遍历数组 {object val = row[propertyInfo.Name];if (val != DBNull.Value){model.Add(propertyInfo.Name, new SolrInputField(propertyInfo.Name, val));}}docs.Add(model);StartId = Convert.ToInt32(row["ID"]);var result = solr.Update("/update", new UpdateOptions() { Docs = docs });}});//GetStartId();lock (syncLock){if (currentIndex <= productsCount){form.fullerProgressBar.Value = (int)currentIndex;}form.fullerTsslCurrentNum.Text = currentIndex.ToString();}}/// <summary>/// 全部索引运行/// </summary>public void FullerRun(){//GetStartId();//form.fullerTsslCurrentNum.Text = currentIndex.ToString();DataTable dt = DataAccess.GetNextProductsInfo(StartId);StartId = AppCongfig.StartId;if (state == State.Break){this.SendMesasge("完全索引已继续,起始ID[" + StartId + "]...");}else{startTime = DateTime.Now;this.SendMesasge("完全索引已启动,起始ID[" + StartId + "]...");}state = State.Runing;form.btnInitIndex.Enabled = false;form.btnSuspend.Enabled = true;form.btnStop.Enabled = true;while (dt != null && dt.Rows.Count > 0 && state == State.Runing){try{InitIndex(dt);//单线程// CreateIndexer(dt);//多线程 }catch (Exception ex){state = State.Stop;form.btnInitIndex.Enabled = true;form.btnSuspend.Enabled = false;form.btnStop.Enabled = false;GetStartId();this.SendMesasge(ex.Message.ToString());}form.fullerTsslTimeSpan.Text = "已运行 :" + GetTimeSpanShow(DateTime.Now - startTime) + ",预计还需:" + GetTimeSpanForecast();try{dt = DataAccess.GetNextProductsInfo(StartId);//获取下一组产品 }catch (Exception err){this.SendMesasge("获取下一组产品出错,起始ID[" + StartId + "]:" + err.Message);}}if (state == State.Runing){state = State.Stop;form.btnInitIndex.Enabled = true;form.btnSuspend.Enabled = false;form.btnStop.Enabled = false;AppCongfig.SetValue("StartId", StartId.ToString());this.SendMesasge("完全索引已完成,总计索引数[" + currentIndex + "]结束的产品Id" + StartId);}else if (state == State.Break){GetStartId();state = State.Break;form.btnInitIndex.Enabled = true;form.btnSuspend.Enabled = false;form.btnStop.Enabled = false;AppCongfig.SetValue("StartId", StartId.ToString());this.SendMesasge("完全索引已暂停,当前索引位置[" + currentIndex + "]结束的产品Id" + StartId);}else if (state == State.Stop){GetStartId();state = State.Stop;this.SendMesasge("完全索引已停止,已索引数[" + currentIndex + "]结束的产品Id" + StartId);form.btnInitIndex.Enabled = true;form.btnSuspend.Enabled = false;form.btnStop.Enabled = false;AppCongfig.SetValue("StartId", StartId.ToString());productsCount = DataAccess.GetCount(StartId); //产品数统计form.fullerTsslMaxNum.Text = productsCount.ToString();form.fullerProgressBar.Minimum = 0;form.fullerProgressBar.Maximum = productsCount;}endTime = DateTime.Now;}/// <summary>/// 多线程构建索引数据方法/// </summary>/// <param name="threadDataParam"></param>public void MultiThreadCreateIndex(object threadDataParam){InitIndex(threadDataParam);}/// <summary>/// 获取最大的索引id/// </summary>private void GetStartId(){IDictionary<string, ICollection<string>> options = new Dictionary<string, ICollection<string>>();options[CommonParams.SORT] = new string[] { "ProductID DESC" };options[CommonParams.START] = new string[] { "0" };options[CommonParams.ROWS] = new string[] { "1" };options[HighlightParams.FIELDS] = new string[] { "ProductID" };options[CommonParams.Q] = new string[] { "*:*" };var result = solrQuery.Query("/select", null, options);var solrDocumentList = (SolrDocumentList)result.Get("response");currentIndex = solrDocumentList.NumFound;if (solrDocumentList != null && solrDocumentList.Count() > 0){StartId = (int)solrDocumentList[0]["ProductID"];//AppCongfig.SetValue("StartId", solrDocumentList[0]["ProductID"].ToString()); }else{StartId = 0;// AppCongfig.SetValue("StartId", "0"); }}/// <summary>/// 优化索引/// </summary>public void Optimize(){this.SendMesasge("开始优化索引,请耐心等待...");var result = solr.Update("/update", new UpdateOptions() { OptimizeOptions = optimizeOptions });var header = binaryResponseHeaderParser.Parse(result);this.SendMesasge("优化索引耗时:" + header.QTime + "毫秒");}/// <summary>/// 发送消息到界面/// </summary>/// <param name="message">发送消息到界面</param>protected void SendMesasge(string message){form.fullerDgvMessage.Rows.Add(form.fullerDgvMessage.Rows.Count + 1, message, DateTime.Now.ToString());}/// <summary>/// 获取时间间隔显示/// </summary>/// <param name="ts">时间间隔</param>/// <returns></returns>protected string GetTimeSpanShow(TimeSpan ts){string text = "";if (ts.Days > 0){text += ts.Days + "天";}if (ts.Hours > 0){text += ts.Hours + "时";}if (ts.Minutes > 0){text += ts.Minutes + "分";}if (ts.Seconds > 0){text += ts.Seconds + "秒";}return text;}/// <summary>/// 获取预测时间/// </summary>/// <returns></returns>protected string GetTimeSpanForecast(){if (currentIndex != 0){TimeSpan tsed = DateTime.Now - startTime;double d = ((tsed.TotalMilliseconds / currentIndex) * productsCount) - tsed.TotalMilliseconds;return GetTimeSpanShow(TimeSpan.FromMilliseconds(d));}return "";}} }
(5)运行索引器,创建索引,这里是我的索引器界面,如图
可以随时跟踪索引生成的情况
(6)索引创建完毕后,可以进入Solr服务器界面http://localhost:8080/one/#/collection1/query进行测试
以上就是Solr的前期工作,主要是Solr服务器搭建和客户端调用生成索引,后期再对客户端的查询进行详细的说明,下期预告
1.全文搜索,分词配置,以及类似于谷歌和百度那种输入关键字自动完成功能
2.Facet查询