我们都知道业界两个最流行的开源搜索引擎,Solr和ElasticSearch。两者都建立在Apache Lucene开源平台之上,它们的主要功能非常相似,但是在部署的易用性,可扩展性和其他功能方面也存在巨大差异。
1. Apache Solr
在过去的十年里,solr发展壮大,拥有广泛的用户群体。solr提供分布式索引、分片、副本集、负载均衡和自动故障转移和恢复功能。不少互联网巨头,如Netflix,eBay,Instagram和Amazon(CloudSearch)均使用Solr。
solr的主要特点:
- 全文索引
- 高亮
- 分面搜索
- 实时索引
- 动态聚类
- 数据库集成
- NoSQL特性和丰富的文档处理(例如Word和PDF文件)
2. Elasticsearch
Elasticsearch在Solr推出几年后才面世的,通过REST和schema-free(不需要预先定义 Schema,solr是需要预先定义的)的JSON文档提供分布式、多租户全文搜索引擎。Elasticsearch可扩展为准实时搜索引擎。其中一个关键特性是多租户功能,可根据不同的用途分索引,可以同时操作多个索引。
上图中,可以在google中的搜索热度,可以看出在2013年后,Elasticsearch与Solr相比具有很大的吸引力,但这并不意味着Apache Solr已经死了。虽然不少人不认可,但Solr仍然是最流行的搜索引擎之一,具有强大的开源社区支持。
性能对比
大型互联网公司,实际生产环境测试,将搜索引擎从Solr转到Elasticsearch以后的平均查询速度有了50倍的提升。
3. 功能特性的差异
说实话,有些特性我确实没研究过,结论也是引入Kelvin Tan。
API
Feature | Solr 7.2.1 | Elasticsearch 6.2.4 |
---|---|---|
Format | XML, CSV, JSON | JSON |
HTTP REST API | ![]() | ![]() |
Binary API ![]() | ![]() | ![]() |
JMX support | ![]() | ![]() |
Official client libraries ![]() | Java | Java, Groovy, PHP, Ruby, Perl, Python, .NET, Javascript Official list of clients |
Community client libraries ![]() | PHP, Ruby, Perl, Scala, Python, .NET, Javascript, Go, Erlang, Clojure | Clojure, Cold Fusion, Erlang, Go, Groovy, Haskell, Java, JavaScript, .NET, OCaml, Perl, PHP, Python, R, Ruby, Scala, Smalltalk, Vert.x Complete list |
3rd-party product integration (open-source)![]() | Drupal, Magento, Django, ColdFusion, Wordpress, OpenCMS, Plone, Typo3, ez Publish, Symfony2, Riak (via Yokozuna) | Drupal, Django, Symfony2, Wordpress, CouchBase |
3rd-party product integration (commercial)![]() | DataStax Enterprise Search, Cloudera Search, Hortonworks Data Platform, MapR | SearchBlox, Hortonworks Data Platform, MapR etc Complete list |
Output![]() | JSON, XML, PHP, Python, Ruby, CSV, Velocity, XSLT, native Java | JSON, XML/HTML (via plugin) |
Infrastructure
Feature | Solr 7.2.1 | Elasticsearch 6.2.4 |
---|---|---|
Master-slave replication | ![]() | ![]() |
Integrated snapshot and restore | Filesystem | Filesystem, AWS Cloud Plugin for S3 repositories, HDFS Plugin for Hadoop environments, Azure Cloud Plugin for Azure storage repositories |
Indexing
Feature | Solr 7.2.1 | Elasticsearch 6.2.4 |
---|---|---|
Data Import | DataImportHandler - JDBC, CSV, XML, Tika, URL, Flat File | [DEPRECATED in 2.x] Rivers modules - ActiveMQ, Amazon SQS, CouchDB, Dropbox, DynamoDB, FileSystem, Git, GitHub, Hazelcast, JDBC, JMS, Kafka, LDAP, MongoDB, neo4j, OAI, RabbitMQ, Redis, RSS, Sofa, Solr, St9, Subversion, Twitter, Wikipedia |
ID field for updates and deduplication | ![]() | ![]() |
DocValues ![]() | ![]() | ![]() |
Partial Doc Updates ![]() | ![]() | ![]() |
Custom Analyzers and Tokenizers ![]() | ![]() | ![]() |
Per-field analyzer chain ![]() | ![]() | ![]() |
Per-doc/query analyzer chain ![]() | ![]() | ![]() |
Index-time synonyms ![]() | ![]() | ![]() |
Query-time synonyms ![]() | ![]() | ![]() |
Multiple indexes ![]() | ![]() | ![]() |
Near-Realtime Search/Indexing ![]() | ![]() | ![]() |
Complex documents ![]() | ![]() | ![]() |
Schemaless ![]() | ![]() | ![]() |
Multiple document types per schema ![]() | ![]() | ![]() |
Online schema changes ![]() | ![]() | ![]() |
Apache Tika integration ![]() | ![]() | ![]() |
Dynamic fields ![]() | ![]() | ![]() |
Field copying ![]() | ![]() | ![]() |
Hash-based deduplication ![]() | ![]() | ![]() |
Index-time sorting ![]() | ![]() | ![]() |
Searching
Feature | Solr 7.2.1 | Elasticsearch 6.2.4 |
---|---|---|
Lucene Query parsing ![]() | ![]() | ![]() |
Structured Query DSL ![]() | ![]() | ![]() |
Span queries ![]() | ![]() | ![]() |
Spatial/geo search ![]() | ![]() | ![]() |
Multi-point spatial search ![]() | ![]() | ![]() |
Faceting ![]() | ![]() | ![]() |
Advanced Faceting ![]() | ![]() | ![]() |
Geo-distance Faceting | ![]() | ![]() |
Pivot Facets ![]() | ![]() | ![]() |
More Like This | ![]() | ![]() |
Boosting by functions ![]() | ![]() | ![]() |
Boosting using scripting languages ![]() | ![]() | ![]() |
Push Queries ![]() | ![]() | ![]() |
Field collapsing/Results grouping ![]() | ![]() | ![]() |
Query Re-Ranking ![]() | ![]() | ![]() |
Index-based Spellcheck ![]() | ![]() | ![]() |
Wordlist-based Spellcheck ![]() | ![]() | ![]() |
Autocomplete | ![]() | ![]() |
Document-oriented Autocomplete | ![]() | ![]() |
Learning to Rank | ![]() | ![]() |
Query elevation ![]() | ![]() | ![]() |
Intra-index joins ![]() | ![]() | ![]() |
Inter-index joins ![]() | ![]() | ![]() |
Resultset Scrolling ![]() | ![]() | ![]() |
Filter queries ![]() | ![]() | ![]() |
Filter execution order ![]() | ![]() | ![]() |
Alternative QueryParsers ![]() | ![]() | ![]() |
Negative boosting ![]() | ![]() | ![]() |
Search across multiple indexes | ![]() | ![]() |
Result highlighting | ![]() | ![]() |
Custom Similarity ![]() | ![]() | ![]() |
Searcher warming on index reload ![]() | ![]() | ![]() |
Term Vectors API | ![]() | ![]() |
SQL queries | ![]() | ![]() |
Distributed Map/Reduce processing | ![]() | ![]() |
Distributed
Feature | Solr 7.2.1 | Elasticsearch 6.2.4 |
---|---|---|
Self-contained cluster ![]() | ![]() | ![]() |
Automatic node discovery | ![]() | ![]() |
Partition tolerance | ![]() | ![]() |
Automatic failover | ![]() | ![]() |
Automatic leader election | ![]() | ![]() |
Shard replication | ![]() | ![]() |
Sharding ![]() | ![]() | ![]() |
Automatic shard rebalancing ![]() | ![]() | ![]() |
Change # of shards | ![]() | ![]() |
Shard splitting | ![]() | ![]() |
Relocate shards and replicas ![]() | ![]() | ![]() |
Control shard routing ![]() | ![]() | ![]() |
Pluggable shard/replica assignment | ![]() | ![]() |
Avoid duplicate indexing on replicas ![]() | ![]() | ![]() |
Consistency | Indexing requests are synchronous with replication. A indexing request won't return until all replicas respond. No check for downed replicas. They will catch up when they recover. When new replicas are added, they won't start accepting and responding to requests until they are finished replicating the index. | Replication between nodes is synchronous by default, thus ES is consistent by default, but it can be set to asynchronous on a per document indexing basis. Index writes can be configured to fail is there are not sufficient active shard replicas. The default is quorum, but all or one are also available. |
4. 总结
- Solr 利用 Zookeeper 进行分布式管理,而 Elasticsearch 自身带有分布式协调管理功能;
- Solr 支持更多格式的数据,而 Elasticsearch 仅支持json文件格式;
- Solr 官方提供的功能更多,而 Elasticsearch 本身更注重于核心功能,高级功能多有第三方插件提供;
- Solr 在传统的搜索应用中表现好于 Elasticsearch,但在处理实时搜索应用时效率明显低于 Elasticsearch。