日常Doris导入本地文件测试练习

练习本地数据导入数据到Doris表

测试场景:

1,匹配数据中缺少一列数据

2,匹配数据中有一行数据重复

建表语句:

create table if not exists table2
   (
   `event_day` date,
   `siteid` int default '10' comment "地址编号",
   `citycode` smallint comment "城市编码",
   `username` varchar(32) default '' comment "用户姓名",
   `pv` bigint sum default '0',
   times int sum default '0' comment "次数"
   )
   engine=olap
   aggregate key(`event_day`,`siteid`,`citycode`,`username`)
   partition by range(`event_day`)
   (
    partition `p202006` values less than("2020-07-01"),
    partition `p202007` values less than("2020-08-01"),
    partition `p202008` values less than("2020-09-01")
   )
    distributed by hash(`siteid`) buckets 10
    properties("replication_num"="1");

本地数据table2_data.txt:

2020-07-03|11|1|jim|2
2020-07-05|12|1|grace|2
2020-07-05|12|1|grace|2
2020-07-12|13|2|tom|2
2020-07-15|14|3|bush|3
2020-07-12|15|3|helen|3

 第一次进行批量导入:

[root@node1 data]# curl --location-trusted -u root:123456 -H "label:table2_20221118" -H "column_separator:|" -T table2_data.txt http://node1:8030/api/test_db/table2/_stream_load
{
    "TxnId": 9008,
    "Label": "table2_20221118",
    "TwoPhaseCommit": "false",
    "Status": "Fail",
    "Message": "too many filtered rows",
    "NumberTotalRows": 6,
    "NumberLoadedRows": 0,
    "NumberFilteredRows": 6,
    "NumberUnselectedRows": 0,
    "LoadBytes": 139,
    "LoadTimeMs": 6,
    "BeginTxnTimeMs": 0,
    "StreamLoadPutTimeMs": 2,
    "ReadDataTimeMs": 0,
    "WriteDataTimeMs": 2,
    "CommitAndPublishTimeMs": 0,
    "ErrorURL": "http://192.168.88.161:8040/api/_load_error_log?file=__shard_2/error_log_insert_stmt_de416744d8ce5505-5faf6147979143b1_de416744d8ce5505_5faf6147979143b1"
}

然后对本地数据table2_data.txt进行数据补充:

2020-07-03|11|1|jim|2|1
2020-07-05|12|1|grace|2|2
2020-07-05|12|1|grace|2|3
2020-07-12|13|2|tom|2|3
2020-07-15|14|3|bush|3|3
2020-07-12|15|3|helen|3|1

然后导入操作 

[root@node1 data]# vim table2_data.txt 
[root@node1 data]# curl --location-trusted -u root:123456 -H "label:table2_20221118" -H "column_separator:|" -T table2_data.txt http://node1:8030/api/test_db/table2/_stream_load
{
    "TxnId": 9010,
    "Label": "table2_20221118",
    "TwoPhaseCommit": "false",
    "Status": "Success",
    "Message": "OK",
    "NumberTotalRows": 6,
    "NumberLoadedRows": 6,
    "NumberFilteredRows": 0,
    "NumberUnselectedRows": 0,
    "LoadBytes": 151,
    "LoadTimeMs": 329,
    "BeginTxnTimeMs": 0,
    "StreamLoadPutTimeMs": 1,
    "ReadDataTimeMs": 0,
    "WriteDataTimeMs": 311,
    "CommitAndPublishTimeMs": 14
}

导入状态为Success,进行数据查询

mysql> select * from table2;
+------------+--------+----------+----------+------+-------+
| event_day  | siteid | citycode | username | pv   | times |
+------------+--------+----------+----------+------+-------+
| 2020-07-15 |     14 |        3 | bush     |    3 |     3 |
| 2020-07-12 |     15 |        3 | helen    |    3 |     1 |
| 2020-07-05 |     12 |        1 | grace    |    4 |     5 |
| 2020-07-03 |     11 |        1 | jim      |    2 |     1 |
| 2020-07-12 |     13 |        2 | tom      |    2 |     3 |
+------------+--------+----------+----------+------+-------+
5 rows in set (0.03 sec)

由于本次测试的使用的是aggregate模型进行测试的,并且value值也是使用的sum函数,所以重复的数据会进行相加

注意:如果后面大家测试的时候遇到"Message": "too many filtered rows"(过滤过的行太多)问题,可以考虑是不是需要导入的数据是不是存在缺少匹配列的问题!

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值