hive select * from表没有数据， select 列有数据

最新推荐文章于 2023-07-18 15:02:07 发布

吃鱼的羊

最新推荐文章于 2023-07-18 15:02:07 发布

阅读量3.9k

点赞数

CC 4.0 BY-SA版权

分类专栏： HIVE Hadoop

本文链接：https://blog.youkuaiyun.com/hellojoy/article/details/78627570

HIVE 同时被 2 个专栏收录

84 篇文章

订阅专栏

Hadoop

74 篇文章

订阅专栏

本文探讨了Hive中'select * from table'与'select column from table'的区别，并深入解析了这两种查询方式背后的处理机制。文章指出，前者直接读取HDFS上的文件并展示所有数据，适合小规模数据查询；后者则通过MapReduce处理，从每一行中提取指定列的数据。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

原因：我的表格式是lzo的，但是我写入的时候并没有指定文件格式，造成select * from表没有数据， select 列有数据

set mapred.output.compress=true;

set hive.exec.compress.output=true;
set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
set mapred.reduce.tasks=137;

insert overwrite table e_ods.ods_dhprds_priceinfo
select priceid,
hotelid,
roomtypeid,
rateplanid,
begindate,
enddate,
rateplancode,
gensalecost,
gensaleprice,
weekendsalecost,
weekendsaleprice,
allowaddbed,
addbedprice,
currencycode,
ispriceset,
iseffective,
auditstatus,
operatetime,
operator,
operateip,
ratecalculationmodeltype,
commissioncalculationtype,
weekdaycommissioncalculationvalue,
weekendcommissioncalculationvalue,
weekdaynetrate,
weekendnetrate,
minprofit,
maxprofit
from e_ods.ods_dhprds_priceinfo_fl;

What is the difference between 'select * from table' and 'select column from table' in hive?

Your table in Hive is stored as a directory in the HDFS. When you do ‘select * from table’ the Hive query processor simply goes to the directory that will have one or more files adhering to table schema and it will dump all the data as it is on display immediately. You may do this if you have very very small data like less than a Gigabyte. In real clusters if you hit ‘select * from table’, it may have data in Terabytes and displaying that will run for long long time.

‘select column from table’ is a projection query on the table where the Hive query processor has to read all the rows in the table and extract the column value from each row and display it. Hive query processor compile the SQL query in to sequence of map reduce programs to achieve data processing. Any data processing you do in Hive is achieved through sequence of map reduce programs that reads data from table stored on HDFS. Hive is a map reduce based batch oriented query processing engine.

Similarly when you add where conditions in the SQL query it will do map reduce based data processing except if you have created partition and where clause is on the partition value e.g., if you have day partition for a table then, any new data you add for a new day a new subfolder is created under the table’s folder for each day. So if you have query like select * from table where day=’400′ it will dump all the files contents under sub directory 400 in the main table directory.

Further tables in Hive may have wide number of columns like 50 columns representing different values. If you want to do select column from table then map reduce program will scan all the rows and extract a column from 50 column values in a row. Better way to do this is to define columnar storage like Parquet files as a file format for Hive table files or RCFile format to extract one or more columns from the table frequently for processing.