maxcomputer pyodps数据基本操作_python maxcomputer sql-优快云博客

本文详细介绍阿里MaxCompute中DataWork平台的Python SDK pyodps使用方法，包括创建表、执行SQL、读取数据等核心操作，为数据工程师提供实用指南。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.阿里maxcomputer中的datawork支持python代码调用，运行；新建PyODPS 节点，将会包含一个全局的变量 odps 或者 o ，即 ODPS 入口，用户调用datawork中的表数据。

2.pyodps中创建表，但不建议这样操作，建议更直接的sql节点建表https://help.aliyun.com/document_detail/90412.html?spm=a2c4g.11186623.2.8.12d744cfBBmd6V#concept-lhx-tmf-cfb

3.pyodps执行sql语句，入口对象的execute_sql()和run_sql()方法可以执行SQL语句，其返回值是任务实例。

o.execute_sql('select * from dual')  #同步的方式执行，会阻塞直到SQL语句执行完成。
instance = o.run_sql('select * from dual')  #异步的方式执行。

4.运行SQL的Instance能够直接执行open_reader操作读取SQL执行结果。

with o.execute_sql('select * from dual').open_reader() as reader:
    for record in reader:
    # 处理每一个record。

5.获取表数据

使用对象入口的read_table()方法，举例如下。

for record in o.read_table('test_table', partition='pt=test'):
# 处理一条记录。

如果您仅需要查看每个表的最开始的小于1万条数据，可以对表对象调用head()方法。
```
t = o.get_table('dual')
# 处理每个Record对象。
for record in t.head(3):
```

在表上执行open_reader()操作来读取数据。如下：

使用with表达式的写法如下所示。

with t.open_reader(partition='pt=test') as reader:
count = reader.count
for record in reader[5:10]  # 可以执行多次，直到将count数量的record读完，此处可以改造成并行操作。
# 处理一条记录。

不使用with表达式的写法如下所示。

reader = t.open_reader(partition='pt=test')
count = reader.count
for record in reader[5:10]  # 可以执行多次，直到将count数量的record读完，这里可以改造成并行操作。
# 处理一条记录。