记录下基于WebHDFS REST API操作HDFS的基本功能,具体更多请参照官网介绍:
http://hadoop.apache.org/docs/r3.2.1/hadoop-project-dist/hadoop-hdfs/WebHDFS.html
# 获取客户端连接
client = Client(url='http://192.168.0.1:50070', root=None, proxy=None, timeout=None, session=None)
# 或者使用InsecureClient,基于InsecureClient时可以指定登录用户,而Client()中的proxy会报异常,还没整明白
client = InsecureClient("http://192.168.0.1:50070", user='hadoop');
# 创建目录
client.makedirs(hdfs_path)
# 删除hdfs文件
client.delete(hdfs_path)
# 上传文件到hdfs
client.upload(hdfs_path, local_path, cleanup=True)
# 从hdfs获取文件到本地
client.download(hdfs_path, local_path, overwrite=False)
# 追加数据到hdfs文件
client.write(hdfs_path, data, overwrite=False, append=True, encoding='utf-8')
# 覆盖数据写到hdfs文件
client.write(