DEFINING TABLE RECORD FORMATS IN HIVE

本文介绍了Hive中用于处理记录并映射到表格数据类型的SerDe机制,包括默认的LazySimpleSerDe、适用于列式存储的ColumnarSerDe、支持正则表达式的RegexSerDe等,并探讨了如何通过Java类自定义SerDe。

The Java technology that Hive uses to process records and map them to column data types in Hive tables is called SerDe, which is short for SerializerDeserializer. The figure illustrates how SerDes are leveraged and it will help you understand how Hive keeps file formats separate from record formats.

image0.jpg

So the first thing to notice is the INPUTFORMAT object. INPUTFORMAT allows you to specify your own Java class should you want Hive to read from a different file format. STORED AS TEXTFILE is easier than writing INPUTFORMAT org.apache.hadoop.mapred.TextInputFormat — the whole Java package tree and class name for the default text file input format object, in other words.

The same is true of the OUTPUTFORMAT object. Instead of writing out the whole Java package tree and class name, the STORED AS TEXTFILE statement takes care of all of that for you.

Hive allows you to separate your record format from your file format, so how exactly do you accomplish this? Simple, you either replace STORED AS TEXTFILE with something like STORED AS RCFILE, or you can create your own Java class and specify the input and output classes using INPUTFORMAT packagepath.classname and OUTPUTFORMAT packagepath.classname.

Finally notice that when Hive is reading data from the HDFS (or local file system), a Java Deserializer formats the data into a record that maps to table column data types. This would characterize the data flow for a HiveQL SELECT statement. When Hive is writing data, a Java Serializer accepts the record Hive uses and translates it such that the OUTPUTFORMAT class can write it to the HDFS (or local file system).

This would characterize the data flow for a HiveQL CREATE-TABLE-AS-SELECT statement. So the INPUTFORMATOUTPUTFORMAT and SerDe objects allow Hive to separate the table record format from the table file format.

Hive bundles a number of SerDes for you to choose from, and you’ll find a larger number available from third parties if you search online. You can also develop your own SerDes if you have a more unusual data type that you want to manage with a Hive table. (Possible examples here are video data and e-mail data.)

In the following list, some of the SerDes provided with Hive are described as well as one third-party option that you may find useful.

  • LazySimpleSerDe: The default SerDe that’s used with the TEXTFILE format; it would be used with our_first_table from the following listing.

    (A) $ $HIVE_HOME/bin hive --service cli
    (B) hive> set hive.cli.print.current.db=true;
    (C) hive (default)> CREATE DATABASE ourfirstdatabase;
    OK
    Time taken: 3.756 seconds
    (D) hive (default)> USE ourfirstdatabase;
    OK
    Time taken: 0.039 seconds
    (E) hive (ourfirstdatabase)> CREATE TABLE our_first_table (
                           > FirstName       STRING,
                           > LastName        STRING,
                           > EmployeeId      INT);
    OK
    Time taken: 0.043 seconds
    hive (ourfirstdatabase)> quit;
    (F) $ ls /home/biadmin/Hive/warehouse/ourfirstdatabase.db
    our_first_table

    It also would be used with data_types_table from the following listing.

    $ ./hive --service cli
    hive> CREATE DATABASE data_types_db;
    OK
    Time taken: 0.119 seconds
    hive> USE data_types_db;
    OK
    Time taken: 0.018 seconds
    (1)Hive> CREATE TABLE data_types_table (
    (2) > our_tinyint     TINYINT   COMMENT '1 byte signed integer',
    (3) > our_smallint    SMALLINT  COMMENT '2 byte signed integer',
    (4) > our_int         INT       COMMENT '4 byte signed integer',
    (5) > our_bigint      BIGINT    COMMENT '8 byte signed integer',
    (6) > our_float       FLOAT     COMMENT 'Single precision floating point',
    (7) > our_double      DOUBLE    COMMENT 'Double precision floating point',
    (8) > our_decimal     DECIMAL   COMMENT 'Precise decimal type based
    (9) >                                      on Java BigDecimal Object',
    (10) > our_timestamp   TIMESTAMP COMMENT 'YYYY-MM-DD HH:MM:SS.fffffffff"
    (11) >                                     (9 decimal place precision)',
    (12) > our_boolean     BOOLEAN   COMMENT 'TRUE or FALSE boolean data type',
    (13) > our_string      STRING    COMMENT 'Character String data type',
    (14) > our_binary      BINARY    COMMENT 'Data Type for Storing arbitrary
    (15) >                                      number of bytes',
    (16) > our_array       ARRAY<TINYINT>  COMMENT 'A collection of fields all of
    (17) >                                 the same data type indexed BY
    (18) >                                 an integer',
    (19) > our_map       MAP<STRING,INT> COMMENT 'A Collection of Key,Value Pairs
    (20) >                                   where the Key is a Primitive
    (21) >                                   Type and the Value can be
    (22) >                                   anything.  The chosen data
    (23) >                                   types for the keys and values
    (24) >                                   must remain the same per map',
    (25) > our_struct    STRUCT<first : SMALLINT, second : FLOAT, third : STRING>
    (26) >                               COMMENT 'A nested complex data
    (27) >                                    structure',
    (28) > our_union      UNIONTYPE<INT,FLOAT,STRING>
    (29) >                             COMMENT 'A Complex Data Type that can
    (30) >                                    hold One of its Possible Data
    (31) >                                    Types at Once')
    (32) > COMMENT 'Table illustrating all Apache Hive data types'
    (33) > ROW FORMAT DELIMITED
    (34) > FIELDS TERMINATED BY ','
    (35) > COLLECTION ITEMS TERMINATED BY '|'
    (36) > MAP KEYS TERMINATED BY '^'
    (37) > LINES TERMINATED BY 'n'
    (38) > STORED AS TEXTFILE
    (39) > TBLPROPERTIES ('creator'='Bruce Brown', 'created_at'='Sat Sep 21 20:46:32 EDT 2013');
    OK
    Time taken: 0.886 seconds
  • ColumnarSerDe: Used with the RCFILE format.

  • RegexSerDe: The regular expression SerDe, which ships with Hive to enable the parsing of text files, RegexSerDe can form a powerful approach for building structured data in Hive tables from unstructured blogs, semi-structured log files, e-mails, tweets, and other data from social media. Regular expressions allow you to extract meaningful information (an e-mail address, for example) with HiveQL from an unstructured or semi-structured text document incompatible with traditional SQL and RDBMSs.

  • HBaseSerDe: Included with Hive to enables it to integrate with HBase. You can store Hive tables in HBase by leveraging this SerDe.

  • JSONSerDe: A third-party SerDe for reading and writing JSON data records with Hive. You can quickly find (via Google and GitHub) two JSON SerDes by searching online for the phrase json serde for hive.

  • AvroSerDe: Included with Hive so that you can read and write Avro data in Hive tables.

Reviewing the Language Manual DDL can be very helpful before you start creating your tables.


基于matlab建模FOC观测器采用龙贝格观测器+PLL进行无传感器控制(Simulink仿真实现)内容概要:本文档主要介绍基于Matlab/Simulink平台实现的多种科研仿真项目,涵盖电机控制、无人机路径规划、电力系统优化、信号处理、图像处理、故障诊断等多个领域。重点内容之一是“基于Matlab建模FOC观测器,采用龙贝格观测器+PLL进行无传感器控制”的Simulink仿真实现,该方法通过状态观测器估算电机转子位置与速度,结合锁相环(PLL)实现精确控制,适用于永磁同步电机等无位置传感器驱动场景。文档还列举了大量相关科研案例与算法实现,如卡尔曼滤波、粒子群优化、深度学习、多智能体协同等,展示了Matlab在工程仿真与算法验证中的广泛应用。; 适合人群:具备一定Matlab编程基础,从事自动化、电气工程、控制科学、机器人、电力电子等相关领域的研究生、科研人员及工程技术人员。; 使用场景及目标:①学习并掌握FOC矢量控制中无传感器控制的核心原理与实现方法;②理解龙贝格观测器与PLL在状态估计中的作用与仿真建模技巧;③借鉴文中丰富的Matlab/Simulink案例,开展科研复现、算法优化或课程设计;④应用于电机驱动系统、无人机控制、智能电网等实际工程仿真项目。; 阅读建议:建议结合Simulink模型与代码进行实践操作,重点关注观测器设计、参数整定与仿真验证流程。对于复杂算法部分,可先从基础案例入手,逐步深入原理分析与模型改进。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值