hive select * from表没有数据, select 列 有数据

本文探讨了Hive中'select * from table'与'select column from table'的区别,并深入解析了这两种查询方式背后的处理机制。文章指出,前者直接读取HDFS上的文件并展示所有数据,适合小规模数据查询;后者则通过MapReduce处理,从每一行中提取指定列的数据。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

原因: 我的表格式是lzo的,但是我写入的时候并没有指定文件格式,造成select * from表没有数据, select 列 有数据

set mapred.output.compress=true;

set hive.exec.compress.output=true;
set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;

set mapred.reduce.tasks=137;


insert overwrite table e_ods.ods_dhprds_priceinfo 
select priceid,
hotelid,
roomtypeid,
rateplanid,
begindate,
enddate,
rateplancode,
gensalecost,
gensaleprice,
weekendsalecost,
weekendsaleprice,
allowaddbed,
addbedprice,
currencycode,
ispriceset,
iseffective,
auditstatus,
operatetime,
operator,
operateip,
ratecalculationmodeltype,
commissioncalculationtype,
weekdaycommissioncalculationvalue,
weekendcommissioncalculationvalue,
weekdaynetrate,
weekendnetrate,
minprofit,
maxprofit 
from e_ods.ods_dhprds_priceinfo_fl;


What is the difference between 'select * from table' and 'select column from table' in hive?


Your table in Hive is stored as a directory in the HDFS. When you do ‘select * from table’ the Hive query processor simply goes to the directory that will have one or more files adhering to table schema and it will dump all the data as it is on display immediately. You may do this if you have very very small data like less than a Gigabyte. In real clusters if you hit ‘select * from table’, it may have data in Terabytes and displaying that will run for long long time.

‘select column from table’ is a projection query on the table where the Hive query processor has to read all the rows in the table and extract the column value from each row and display it. Hive query processor compile the SQL query in to sequence of map reduce programs to achieve data processing. Any data processing you do in Hive is achieved through sequence of map reduce programs that reads data from table stored on HDFS. Hive is a map reduce based batch oriented query processing engine.

Similarly when you add where conditions in the SQL query it will do map reduce based data processing except if you have created partition and where clause is on the partition value e.g., if you have day partition for a table then, any new data you add for a new day a new subfolder is created under the table’s folder for each day. So if you have query like select * from table where day=’400′ it will dump all the files contents under sub directory 400 in the main table directory.

Further tables in Hive may have wide number of columns like 50 columns representing different values. If you want to do select column from table then map reduce program will scan all the rows and extract a column from 50 column values in a row. Better way to do this is to define columnar storage like Parquet files as a file format for Hive table files or RCFile format to extract one or more columns from the table frequently for processing.



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值