INTRODUCTION 介绍
Apache CarbonData has Rich Multi-Level Index Support.
Apache CarbonData有丰富的多级索引支持。
DESCRIPTION 描述
Apache CarbonData uses multiple indexes at various levels to enable faster search and query processing.
Apache CarbonData使用多个级别的索引来实现更快的搜索和查询处理。
Using indexes, we can efficiently find the position of the data that is required while skipping the parts of data that are not required (need not be processed) and hence results in faster query processing.
使用索引,我们能更有效的找到数据的位置,因为我们能跳过我们不需要的数据,因此会加速查询操作。
Storing data along with index significantly accelerates query performance and reduces the I/O scans and CPU resources in case of filters in the query. CarbonData index consists of multiple levels of indices, a processing framework can leverage this index to reduce the number of tasks it needs to schedule and process. It can also do skip scan in more fine grained units (called blocklet) in task side scanning instead of scanning the whole file.
存储数据的时候使用索引,能够明显的提高查询性能,减少I/O扫描和CPU资源,因为在查询的时候,很多数据会被过滤。CarbonData的索引由多级索引组成,处理框架可以利用这个索引去减少很多需要调度和处理的任务。它也可以从一侧开始更细粒度的去扫描blocklet,而不是扫描整个文件。
To get data using indexing, the steps followed are : 为了让数据使用索引,步骤如下
- File Pruning. 文件修剪
- Blocklet Pruning. Blocklet修剪
- Binary search using Inverted Index. 使用倒序索引进行二分查找
TYPES OF INDEXES 索引的类型
Index stored in file footer(enables two level of B+ tree indexing):
索引存储在file footer(两种级别的B+树索引)
- Table level index: global B+ tree, efficient file level filtering.Searching for the file, using the table level index.These files will be further used, to get the row-groups(Data Blocks) using the file level index.
表级别的索引:全局的B+树,高效的文件级别过滤。在文件中检索,使用表级别的索引。这些文件将会别进一步利用,使用文件级索引去得到row-groups(Data Blocks)。
图一
- File level index: local B+ tree, efficient blocklet level filtering
文件级别的索引,本地的B+树,高效的blocklet级别过滤。
图二
Global Multi Dimensional Keys(MDK) based B+Tree Index for all non- measure columns aids in quickly locating the row groups(Data Blocks) that contain the data matching search/filter criteria.
依据B+Tree索引的全局的Multi Dimensional Keys(MDK)帮助为测量的列去快速定位包含搜索/过滤条件匹配的数据的row groups(Data Blocks)。
图三
Min-Max Index for all columns aids in quickly locating the row groups(Data Blocks) that contain the data matching search/filter criteria.
Min-Max索引帮助所有列去快速定位到包含搜索/过滤条件匹配的数据的row group(Data Blocks)
图四
Column level index: inverted index used for efficient column chunk scan 列级别的索引:倒序索引,用于高效的列块扫描
图五
Data Block level Inverted Index for all columns aids in quickly locating the rows that contain the data matching search/filter criteria within a row group(Data Blocks).
针对所有列的Data Block级别的倒序索引,帮助快速定位到包含数据匹配搜索和过滤的行。
图六