因数据量过大,运行sqoop跑不动或者卡内存,于是通过写脚本分批导入到HDFS,然后再加载到Hive表中。
shell脚本如下:
#!/bin/bash
source /etc/profile
host=127.0.0.1
for((i=1; i<=100; i++))
do
start=$(((${i} - 1) * 100000 + 1))
end=$((${i} * 100000))
sql="select person_id,capture_time,write_time,capture_resource_id,major_capture_image_url,minor_capture_image_url,sex,age,orientation,glasses,knapsack, bag,messenger_bag,shoulder_bag,umbrella,hair,hat,mask,upper_color,uppe