本地文件有传输到odps的需要,虽然阿里云dataworks有这样的操作界面,但是文件内容有个逗号啥的就会有问题,所以专门写个脚本处理这一步。
处理逻辑:
pandas_read_csv ---->pyodps的dataframe------>odps
代码如下,简单方便:
# -*- coding: utf-8 -*-
import pandas as pd
from odps import ODPS
from odps.df import DataFrame
o = ODPS(
access_id='********',
secret_access_key='***********',
project='**************',
endpoint='http://service.odps.aliyun.com/api'
)
#写入的时候是按照列名匹配的,所以对这个dataframe重命名
dewu_offline = pd.read_csv("/Users/wangyuhang/Downloads/shihuo_20200802.tsv",sep='\t',header=0,
names=['order_no',
'sub_order_no',
'biz_type',
'biz_channel',
'biz_code',
'biz_id',
'sub_order_status',
'pay_amount',
'discount_amount',
'inventory_id',
'spu_id',
'sku_id',
'sku_title',
'sku_price',
'sku_count',
'buyer_note',
'deposit_amount',
'poundage_amount',
'poundage_info',
'item_info',
'close_type',
'close_time',
'feature',
'is_del',
'create_time',
'modify_time',
'product_amount',
'freight_amount',
'sku_logo',
'delivery_deadline',
'tab_tag',
'size',
'identify_status',
'pay_status',
'source_name',
'customized_parameter',
'no_use'])
dewu_offline_all = DataFrame(dewu_offline)
print(dewu_offline_all.head(5))
#如果这里是个非分区表
#dewu_offline_all.persist('tmp_shihuo_du_order_from_dewu_offline',odps=o)
dewu_offline_all.persist('tmp_shihuo_du_order_from_dewu_offline_all', partition='dt=20200802',odps=o,create_partition=True)
所以感觉python有时候还是方便的