github地址:https://github.com/simon987/sist2
demo:sist2.simon987.net
编译版本:
https://github.com/simon987/sist2/releases
用法:
https://github.com/simon987/sist2/blob/master/docs/USAGE.md
需求:
docker运行一个elasticsearch
步骤1:
扫描文档:-t 线程数;-name 索引名;path/documents 文档路径;path/documents.idx 索引路径
- ./sist2 scan -t 4 -q 1.0 –content-size 99900000000 –archive recurse –name “documents” path/documents -o path/documents.idx
复制代码
增量扫描:path/updated_idx 增量更新路径
- ./sist2 scan -t 4 -q 1.0 –content-size 99900000000 –archive recurse –name “documents” path/documents –incremental path/documents.idx/ -o path/updated_idx/
复制代码
步骤2:
强制删除elasticsearch索引并重新上传:–es-index es索引名
- ./sist2 index –force-reset –batch-size 1000 –es-url http://localhost:9200 –es-index sist2 path/documents.idx
复制代码
不删除旧索引并上传
- ./sist2 index –batch-size 1000 –es-url http://localhost:9200 –es-index sist2 path/documents.idx
复制代码
步骤3:
运行web服务:name:password 设置web服务的用户名和密码(可选,如果不需要,删除此命令);path/documents.idx scan的索引路径,有几个路径写几个,路径之间需空格
- ./sist2 web –es-url http://localhost:9200 –es-index sist2 –auth name:password –bind 127.0.0.1:8888 path/documents.idx
复制代码
|