Elastisearch中提供了river模块来从其他数据源中获取数据,该项功能以插件的形式存在,目前已有的river插件包括:
river pluginsedit
1. Supported by Elasticsearch
- CouchDB River Plugin
- RabbitMQ River Plugin
- Twitter River Plugin
- Wikipedia River Plugin
- ActiveMQ River Plugin (by Dominik Dorn)
- Amazon SQS River Plugin (by Alex Bogdanovski)
- CSV River Plugin (by Martin Bednar)
- Dropbox River Plugin (by David Pilato)
- FileSystem River Plugin (by David Pilato)
- Git River Plugin (by Olivier Bazoud)
- GitHub River Plugin (by uberVU)
- Hazelcast River Plugin (by Steve Samuel)
- JDBC River Plugin (by Jörg Prante)
- JMS River Plugin (by Steve Sarandos)
- Kafka River Plugin (by Endgame Inc.)
- LDAP River Plugin (by Tanguy Leroux)
- MongoDB River Plugin (by Richard Louapre)
- Neo4j River Plugin (by Steve Samuel)
- Open Archives Initiative (OAI) River Plugin (by Jörg Prante)
- Redis River Plugin (by Steve Samuel)
- RSS River Plugin (by David Pilato)
- Sofa River Plugin (by adamlofts)
- Solr River Plugin (by Luca Cavanna)
- St9 River Plugin (by Sunny Gleason)
- Subversion River Plugin (by Pascal Lombard)
- DynamoDB River Plugin (by Kevin Wang)
elasticsearch-river-jdbc的源码在:https://github.com/jprante/elasticsearch-river-jdbc,该项目提供了详细的文档,下面以SQL Server为例简单说明使用方法。
首先,需要安装elasticsearch-river-jdbc,在elasticsearch目录下执行:
./bin/plugin --install jdbc --url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-river-jdbc/1.1.0.1/elasticsearch-river-jdbc-1.1.0.1-plugin.zip
然后,安装SQLServer的JDBC库,链接为:
Microsoft JDBC Driver。把其中的
‘sqljdbc4.jar’复制到elasticsearch安装目录的lib文件夹下。
考虑到elasticsearch集群,以上两个步骤在每个节点上都需要执行。
最后也是最关键的一步,在elasticsearch中建立river,让elasticsearch自动从SQLServer中获取数据。
PUT /_river/mytest_river/_meta
{
“type” : “jdbc”,
“jdbc” : {
“driver”:”com.microsoft.sqlserver.jdbc.SQLServerDriver”,
“url”:”jdbc:sqlserver://MYSQLSERVERNAME;databaseName=MYProductDatabase”,
“user”:”admin”,”password”:”Password”,
“sql”:”select ProductID as _id, CategoryID,ManufacturerID,MfName,ProductTitle,MfgPartNumber from MyProductsTable(nolock)”,
“poll”:”10m”,
“strategy” : “simple”,
“index” : “myinventory”,
“type” : “product”,
“bulk_size” : 100,
“max_retries”: 5,
“max_retries_wait”:”30s”,
“max_bulk_requests” : 5,
“bulk_flush_interval” : “5s”
}
}
其中,各个参数选项的含义参阅文档:https://github.com/jprante/elasticsearch-river-jdbc/wiki/JDBC-River-parameters
参考文档:
- http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html
- http://blog.youkuaiyun.com/an74520/article/details/8740065
- http://www.techovity.com/create-river-elasticsearch-ms-sql-server-automatic-data-transfer/