Hive使用正则匹配字段

本文详细介绍了如何在Hive中使用正则表达式进行字段匹配,包括建表、数据准备、数据导入、查询等步骤,是大数据处理中的实用技巧。

1、建表

CREATE TABLE tmp.regex_log (
  log_date STRING,
  log_time STRING,
  log_type STRING,
  log_info STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "(^[0-9|-]{0,}) ([^ ]*) (INFO|WARNING|ERROR) ([^*]*)$",
  "output.format.string" = "%1$s %2$s %3$s %4$s"
)
STORED AS TEXTFILE;

2、准备数据

cat /opt/my.log

2022-07-28 16:13:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/689ec2d7-b603-4ad1-863d-cb7d4b0bba04
2022-07-28 16:13:03 INFO session.SessionState: Created local directory: /tmp/root/689ec2d7-b603-4ad1-863d-cb7d4b0bba04
2022-07-28 16:13:03 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/689ec2d7-b603-4ad1-863d-cb7d4b0bba04/_tmp_space.db
2022-07-28 16:13:03 INFO conf.HiveConf: Using the default value passed in for log id: 689ec2d7-b603-4ad1-863d-cb7d4b0bba04
2022-07-28 16:13:03 INFO session.SessionState: Updating thread name to 689ec2d7-b603-4ad1-863d-cb7d4b0bba04 main
2022-07-28 16:13:03 WARNING Hive CLI is deprecated and migration to Beeline is recommended.
2022-07-28 16:14:38 INFO conf.HiveConf: Using the default value passed in for log id: 689ec2d7-b603-4ad1-863d-cb7d4b0bba04
2022-07-28 16:14:39 INFO ql.Driver: Compiling command(queryId=root_20220728161438_9466e8ce-c2bc-4608-ade9-99915b4b1e4a): select '123' as id, explode(array(1,3,5))
2022-07-28 16:14:40 INFO hive.metastore: HMS client filtering is enabled.
2022-07-28 16:14:40 INFO hive.metastore: Trying to connect to metastore with URI thrift://KF-CFT-Hadoop03:9083
2022-07-28 16:14:40 INFO hive.metastore: Opened a connection to metastore, current connections: 1
2022-07-28 16:14:40 INFO hive.metastore: Connected to metastore.
2022-07-28 16:14:41 ERROR ql.Driver: FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions

3、导入

load data local inpath '/opt/my.log' overwrite into table tmp.regex_log;

4、查询

select * from tmp.regex_log;

+---------------------+---------------------+---------------------+----------------------------------------------------+
| regex_log.log_date  | regex_log.log_time  | regex_log.log_type  |                 regex_log.log_info                 |
+---------------------+---------------------+---------------------+----------------------------------------------------+
| 2022-07-28          | 16:13:03            | INFO                | session.SessionState: Created HDFS directory: /tmp/hive/root/689ec2d7-b603-4ad1-863d-cb7d4b0bba04 |
| 2022-07-28          | 16:13:03            | INFO                | session.SessionState: Created local directory: /tmp/root/689ec2d7-b603-4ad1-863d-cb7d4b0bba04 |
| 2022-07-28          | 16:13:03            | INFO                | session.SessionState: Created HDFS directory: /tmp/hive/root/689ec2d7-b603-4ad1-863d-cb7d4b0bba04/_tmp_space.db |
| 2022-07-28          | 16:13:03            | INFO                | conf.HiveConf: Using the default value passed in for log id: 689ec2d7-b603-4ad1-863d-cb7d4b0bba04 |
| 2022-07-28          | 16:13:03            | INFO                | session.SessionState: Updating thread name to 689ec2d7-b603-4ad1-863d-cb7d4b0bba04 main |
| 2022-07-28          | 16:13:03            | WARNING             | Hive CLI is deprecated and migration to Beeline is recommended. |
| 2022-07-28          | 16:14:38            | INFO                | conf.HiveConf: Using the default value passed in for log id: 689ec2d7-b603-4ad1-863d-cb7d4b0bba04 |
| 2022-07-28          | 16:14:39            | INFO                | ql.Driver: Compiling command(queryId=root_20220728161438_9466e8ce-c2bc-4608-ade9-99915b4b1e4a): select '123' as id, explode(array(1,3,5)) |
| 2022-07-28          | 16:14:40            | INFO                | hive.metastore: HMS client filtering is enabled.   |
| 2022-07-28          | 16:14:40            | INFO                | hive.metastore: Trying to connect to metastore with URI thrift://KF-CFT-Hadoop03:9083 |
| 2022-07-28          | 16:14:40            | INFO                | hive.metastore: Opened a connection to metastore, current connections: 1 |
| 2022-07-28          | 16:14:40            | INFO                | hive.metastore: Connected to metastore.            |
| 2022-07-28          | 16:14:41            | ERROR               | ql.Driver: FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions |
+---------------------+---------------------+---------------------+----------------------------------------------------+

5、修改

alter table tmp.regex_log set SERDEPROPERTIES ('input.regex'='(^[0-9|-]{0,}) ([^ ]*) (INFO|WARNING|ERROR) ([^*]*)$');

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值