利用mapreduce计算框架向hbase插入数据(python脚本)

最新推荐文章于 2022-10-18 19:14:32 发布

smithliang1996

最新推荐文章于 2022-10-18 19:14:32 发布

阅读量1.8k

点赞数

CC 4.0 BY-SA版权

分类专栏：大数据 hbase 数据库文章标签： hbase hadoop 数据库

本文链接：https://blog.youkuaiyun.com/smithliang1996/article/details/78510261

本文介绍了如何利用MapReduce计算框架结合Python，通过Thrift组件来操作HBase数据库。首先，概述了MapReduce在Hadoop中的重要性以及HBase作为NoSQL数据库的优势。接着，详细阐述了安装ThriftServer、配置HBase源码以及创建Python API的过程。在地图减少任务中，重点展示了map.py和run.sh文件的内容，并强调了在集群环境中，确保每个节点都有HBase和ThriftServer的正确配置。最后，预告了下一步将使用Hive语句进行HBase数据插入。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

mapreduce计算框架是hadoop项目中的一个分布式计算框架，他的强大的吞吐能力和批量的数据输出使之成为离线数据挖掘的首选框架。
hbase是一个nosql数据库，是参考了google内部的bigtable模型设计出来的一个nosql数据库，他减少了数据的冗余和使查询的效率提高，是实现数据挖掘的相关数据库的nosql数据库的首选语言，且底层数据存储在hadoop中的hdfs中。

使用版本：hadoop1.0和hbase0.98

hbase使用java编写的，所以要使用java开发的话，可以直接调用hbase计算框架，但是要使用java以外的语言开发的话，必须使用相关的依赖组件。

Hbase的python操作
thrift就是使用python操作Hbase数据库的依赖组件

安装thriftserver：

要使用thrift，得把这些插件安装完成

yum -y install automake libtool bison pkgconfig gcc-c++ boost-devel libevent-devel alib-devel python-devel ruby-devel openssl-devel boost-devel.x86_64 libevent-devel.x86_64

接下来安装thrift

wget http://archive.apache.org/dist/thrift/0.8.0/thrift-0.8.0.tar.gz
//下载thrift插件
tar zxvf thrift-0.8.0.tar.gz
//解压thrift

然后进行配置和编译

#./configure --with-cpp=no --with-ruby=no
#make
#make install
//检查是否编译完成
thrift


file
Options:
  -version    Print the compiler version
  -o dir      Set the output directory for gen-* packages
               (default: current directory)
  -out dir    Set the ouput location for generated files.
               (no gen-* folder will be created)
  -I dir      Add a directory to the list of directories
                searched for include directives
  -nowarn     Suppress all compiler warnings (BAD!)
  -strict     Strict compiler warnings on