hive-the summaries of features

本文概述了Hive作为数据仓库框架的核心功能,包括其SQL风格的查询语言、架构、特性、与其他技术的对比以及使用场景。Hive提供了一种高效加载和处理大量数据的方式,并且支持SQL查询,使得数据分析师能够轻松地从海量数据中提取价值。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

  in these days ,i learned to the data warehouse framework-hive ,mainly from the ebook 'programming hive' [1],as it's about 23 chapters in detail;)

  so below are the outlines about this topic:

 

1.overview

2.architecture

3.features

4. hive vs pig,hive vs hbase

5.use cases

 

1.overview

  a data warehouse ,as you think ,one of the most explict features is data scale,may be gb,tb and event pb is a common case.in general ,we want to load the data into a database then query it by SQL ,finally generate a report to give out a decision etc.

  yep,similar to common rmdbs,hive supplies a SQL-style query language(HSQL which will explain the sqls to issue mapreduce jobs),in addition ,hive can use misc fs like local-fs,dfs,hbase cassandra as its underlying essential storage system . 

  so u maybe guess that hive is only a terminal that lies on a dfs.so it is very easy to load or process huge numbe of data via hive tool.

 

2.architecture

  

   figure 1 from 'programming hive'[1]



 

    figure from 'hive-a warehousing framework from facebook'[2]

 

  integrate both fitures above,we know that hive supplies some useful operation interfaces: cli,jdbc/odbc,web gui.and the  thrift server is used to construct a bridge between jdbc/odbc and hive core modules.

  secondly,there is a named 'metastore' machism which used to store table schema etc meta data,ie. derby db by default.as in generaly these info is small enough to place there .but u should use a spare db solutions like mysql cluster if you want to aoid a SPOF (single point  of failure) .

 

3.features

 here are certain important features referenced from 'programming hive':

  sql-style execution language

  the main sql grammers are introduced to hive,so query from  and load to hive are all no learning curves as who has basic sql knowledges;and it's easy to concise sqls command than mapreduce code to issue jobs.

  flexable,controllabe 

  in hive,there are certain execution modes to execute the sql commands.in local mode,it's fit for analyzing small data processes,as this will spawn local job to do that other than real cluster running;in parallel mode,the jobs for the same goal(bussiness) are allowed to run parallelly if that some tasks are independent;in restrict mode,some cost-heavy operations are disallowed to spawn,so this will void to preempt large resources which other jobs are necessary.

  static data 

  unlike a command rmdbs,hive doesn't support row-level DDL to operate on data,once the data are placed in,no updates are allowed;instead,u can use some DML to mantian it some schema related cases.

  own table stucture/store but not rapid query reponse time and not supports transactions

  yep,hive can be integrated with other no-sql solutions,like hbase,cassandra etc,but if u are not happy with them ,u can use the hive-own table to store data query from underlying fil system.

  but unlike nosql,hive will not support effect query responses,as it 's desinged to analyze large scale data rather than a common db,this is why hive comes from.

  partitioned table

  with this ,the data generated from hive can be scaled horizontally across the cluster.

  supports views and indexes

  hive will use a extended table to support indexes,like hbase,this is a usual solution .

  file compression

  similar to hadoop 's compress ,the output of hive is also able to use compress algorithms to reduce io floods.

 

  

4. hive vs pig ,hive vs hbase

  hive vs pig

 hivepig
execution languagesqlpig latin
own dbyes(so it supports jdbc/odbc connectors)no
execution modelcompile(to MR),optimize,executesame as hive

 

  hive vs hbase

 

 hivehbase
support sql yesno
transactionno

row-level(not confirm above 1.x

for table -level)

real responsenoyes
targetdata warehouseno-sql
indexyesno
file systemhdfs,mapR..hdfs

 

 

5.use cases

  note that above 2.x the hive jar is complied with jdk7,so if u deploy it to a jdk6 or below ,it will complain the 'unsupport class version for marjor.minor:51.0' error.but u can upgrade to 1.x in the fly.

 

 

 

ref:

[1]programming hive

[2]hive-a warehousing solution from facebook

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值