全文检索Sphinx
2009-07-11参考全文检索Ferret
以下内容不再使用
技术概览
- 全文检索引擎采用:sphinx(0.9.8rc2)
- 中文分词:libmmseg(0.7.3)
- Rails调用引擎的插件:thinking-sphinx(0.9.5)
注意,当前使用的thinking-sphinx与官方的不一样,增加了中文分词的配置,修正delta index无法自动更新的问题
安装
libmmseg
sudo apt-get install g++ cd ~ wget http://cloud.github.com/downloads/saberma/saberma.github.com/mmseg-0.7.3.tar.gz tar zxvf mmseg-0.7.3.tar.gz cd mmseg-0.7.3 ./configure make sudo make install #安装ruby扩展 cd ruby cp /usr/local/include/mmseg/*.h . cp ../src/*.h . cp ../src/css/*.h . ruby extconf.lin.rb make sudo make install 注意:如果在这一步出错,且出错提示为: css/UnigramCorpusReader.cpp:89: error: ’strncmp’ was not declared in this scope 则需手工编辑.src/css目录下UnigramCorpusReader.cpp 文件,在其第一行加上 #include <string.h> 然后重新 make,即可通过
注:已生成的词库uni.lib放在项目的lib目录下(值班室项目已经放置此文件了)
sphinx
#"下载sphinx" cd ~ wget http://cloud.github.com/downloads/saberma/saberma.github.com/sphinx-0.9.8-rc2.tar.gz tar zxvf sphinx-0.9.8-rc2.tar.gz cd sphinx-0.9.8-rc2 sudo apt-get install patch #下载中文补丁 wget http://cloud.github.com/downloads/saberma/saberma.github.com/sphinx-0.98rc2.zhcn-support.patch patch -p1 < sphinx-0.98rc2.zhcn-support.patch #下载防crash补丁 wget http://cloud.github.com/downloads/saberma/saberma.github.com/fix-crash-in-excerpts.patch patch -p1 < fix-crash-in-excerpts.patch ./configure make sudo make install 注意:如果在这一步出现 /usr/local/include/mmseg/freelist.h:22: error: ‘strlen’ was not declared in this scope 的错误,手工修改 /usr/local/mmseg/include/mmseg/freelist.h 在上面添加 #include <string.h>
安装thinking-sphinx
(此步骤已经集成进[Rails说明]中[获取源代码]一节,不需要再独立执行)
git submodule init git submodule update
如果安装时报错,按以下步骤处理
#删除.gitmodules,.git/config中的submodule配置
#删除thinking-sphinx目录
git rm —cached vendor/plugins/thinking-sphinx
sudo rm -r vendor/plugins/thinking-sphinx
git submodule add -b v0.9.5chinese git://github.com/saberma/thinking-sphinx.git vendor/plugins/thinking-sphinx
启动引擎
(此步骤应在[Rails说明]中[获取源代码]之后操作)
#生成sphinx配置文件 rake ts:config #建立索引 rake ts:index #启动引擎 rake ts:start
测试引擎
(此步骤应在[Rails说明]中[获取源代码]之后操作)
script/console c = Call.last c.callnumber = '13911112222' c.save #可以看到后台输出更新delta index Sphinx 0.9.8-rc2 (r1234) Copyright (c) 2001-2008, Andrew Aksyonoff using config file '/home/mahb/Documents/zbs/config/development.sphinx.conf'... indexing index 'call_delta'..."防crash补丁":http://www.coreseek.com/uploads/sources/fix-crash-in-excerpts.patch collected 1 docs, 0.0 MB sorted 0.0 Mhits, 100.0% done total 1 docs, 31 bytes total 0.031 sec, 1014.20 bytes/sec, 32.72 docs/sec rotating indices: succesfully sent SIGHUP to searchd (pid=5812) #开始查询 Call.search '13911112222' #这时可以看到相应的记录
使用说明
#或参考call.rb