1. ES 基础知识
Elasticsearch使用Lucene作为其核心来实现所有索引和搜索的功能,它通过简单的RESTful API来隐藏Lucene的复杂性,从而让全文搜索变得简单。
文档型数据库
倒排索引:
面向文档:
{
"email": "john@smith.com",
"first_name": "John",
"last_name": "Smith",
"info": {
"bio": "Eco-warrior and defender of the weak",
"age": 25,
"interests": [ "dolphins", "whales" ]
},
"join_date": "2014/05/01"
}
动态更新索引
1.4 ES数据架构的主要概念(与关系数据库Mysql对比)
关系数据库 ⇒ 数据库 ⇒ 表 ⇒ 行 ⇒ 列(Columns)
Elasticsearch ⇒ 索引(Index) ⇒ 类型(type) ⇒ 文档(Docments) ⇒ 字段(Fields)
使用ES的案例:
1) 2013年初,GitHub抛弃了Solr,采取ElasticSearch 来做PB级的搜索。 “GitHub使用ElasticSearch搜索20TB的数据,包括13亿文件和1300亿行代码”。
2)维基百科:启动以elasticsearch为基础的核心搜索架构。
3)SoundCloud:“SoundCloud使用ElasticSearch为1.8亿用户提供即时而精准的音乐搜索服务”。
4)百度:百度目前广泛使用ElasticSearch作为文本数据分析,采集百度所有服务器上的各类指标数据及用户自定义数据,通过对各种数据进行多维分析展示,辅助定位分析实例异常或业务层面异常。目前覆盖百度内部20多个业务线(包括casio、云分析、网盟、预测、文库、直达号、钱包、风控等),单集群最大100台机器,200个ES节点,每天导入30TB+数据。
维基百科、Stack Overflow、Github 都采用它
ES与其他搜索引擎的对比
性能对比:
OPEN SOURCE SEARCH COMPARISON
Here is a brief comparison of Elasticsearch vs. Solr vs. Sphinx:
Elasticsearch | Solr | Sphinx | |
Types of Search Features | 1. full-text | 1. full-text 2. autocomplete suggestions 3. faceted 4. multifield 5. synonyms 6. fuzzy 7. highlighting 8. geospatial 9. spell checker | 1. full-text |
Real Time Indexing | Yes | Yes | Yes |
Performance | High | High | High |
Scalability | High | High | High |
Data Scheme | Schema-free∗ | Yes, but dynamic∗ | Yes∗ |
Can be storage | Yes | Yes | No |
Visualization of Data | Allowed by the Elastic Stack (ES, Kibana, and Logstash) | Allowed by Banana plugin | No |
Machine Learning | Yes | Yes | No |
solr
The major feature list includes:
- Full-text search
- Highlighting
- Faceted search
- Real-time indexing
- Dynamic clustering
- Database integration
- NoSQL features and rich document handling (Word and PDF files, for example)
ES:
It offers a distributed,
multitenant-capable,
full-text search engine with an HTTP web interface (REST) and schema-free JSON documents.
The official client libraries for Elasticsearch are available in Java, Groovy, PHP, Ruby, Perl, Python, .NET, and Javascript
shards, and each shard can have multiple replicas
Elasticsearch is scalable with near real-time search
JSON-based
The major feature list includes:
- Distributed search
- Multi-tenancy
- An analyzer chain
- Analytical search
- Grouping & aggregation
The Trend