Doc values are the on-disk data structure, built at document index time, which makes this data access pattern possible. They store the same values as the _source
but in a column-oriented fashion that is way more efficient for sorting and aggregations.(本质!!!) Doc values are supported on almost all field types, with the notable exception of analyzed
string fields.
All fields which support doc values have them enabled by default. If you are sure that you don’t need to sort or aggregate on a field, or access the field value from a script, you can disable doc values in order to save disk space:
PUT my_index
{
"mappings": { "my_type": { "properties": { "status_code": { "type": "keyword" }, "session_id": { "type": "keyword", "doc_values": false } } } } }
The | |
The |
简单的说,Elasticsearch通过反向索引做搜索,通过DocValues列式存储做分析,将搜索和分析的场景统一到了通一个分布式系统中,还是很有搞头的。不过分析不仅仅是聚合,这也是Elasticsearch还需要继续努力的方向,目前通过Elasticsearch-Hadoop项目,可以将Elasticsearch的搜索结果做为Spark的RDD,利用Spark做更深度的分析。未来如果分布式计算这一层能够和Spark这样的计算框架再进一步做深度的融合,恐怕有可能成为大数据领域内的另外一个大杀器。
摘自:https://www.elastic.co/guide/en/elasticsearch/reference/current/doc-values.html