HIVE-架构篇

最新推荐文章于 2025-06-09 00:15:00 发布

freshghost1234

最新推荐文章于 2025-06-09 00:15:00 发布

阅读量322

点赞数

CC 4.0 BY-SA版权

分类专栏：大数据-sql数据库框架-hive 文章标签： hive

本文链接：https://blog.youkuaiyun.com/qq_34969081/article/details/79046726

大数据-sql数据库框架-hive 专栏收录该内容

4 篇文章

订阅专栏

本文解析了Hive架构，包括UI、Driver、Compiler、Metastore和Execution Engine等五个组件，并详细介绍了元数据存储的优点和使用独立数据存储替代HDFS的不足之处。此外，还讨论了Hive中的三个数据模型：表格、分区和桶。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

hive架构的官方文档

直接上架构图

这里写图片描述

这张图很明显显示hive存在5个组件ui、driver、compiler、metrastore、ExecutionEngine。具体活动流程也比较清楚，一共9步。
这里主要以下两点：
metastore(元数据)

Metastore is an object store with a database or file backed store.
The database backed store is implemented using an object-relational mapping (ORM) solution called the DataNucleus. 
The prime motivation for storing this in a relational database is queriability of metadata.
Some disadvantages of using a separate data store for metadata instead of using HDFS are synchronization and scalability issues. 
Additionally there is no clear way to implement an object store on top of HDFS due to lack of random updates to files. 
This, coupled with the advantages of queriability of a relational store, made our approach a sensible one.

元数据存储是一个使用了数据库或者文件的对象存储，支持存储的数据库使用一个orm 对象关系映射解决方案DataNucleus。
首要的动机是元数据的可存储性，就是将元数据（orm）存储到一个关系数据库中。元数据使用一个独立的数据储存代替hdfs的缺点是同步性和伸缩性。另外由于缺少随机更新文件，而在一个hdfs上实现一个对象存储是一个不聪明的方法。存储到关系数据库是比较明智的。

数据模型

Tables – These are analogous to Tables in Relational Databases. Tables can be filtered, projected, joined and unioned. Additionally all the data of a table is stored in a directory in HDFS. Hive also supports the notion of external tables wherein a table can be created on prexisting files or directories in HDFS by providing the appropriate location to the table creation DDL. The rows in a table are organized into typed columns similar to Relational Databases.
Partitions – Each Table can have one or more partition keys which determine how the data is stored, for example a table T with a date partition column ds had files with data for a particular date stored in the <table location>/ds=<date> directory in HDFS. Partitions allow the system to prune data to be inspected based on query predicates, for example a query that is interested in rows from T that satisfy the predicate T.ds = '2008-09-01' would only have to look at files in <table location>/ds=2008-09-01/ directory in HDFS.
Buckets – Data in each partition may in turn be divided into Buckets based on the hash of a column in the table. Each bucket is stored as a file in the partition directory. Bucketing allows the system to efficiently evaluate queries that depend on a sample of data (these are queries that use the SAMPLE clause on the table).