mysql applier with hadoop

本文介绍了一种通过MySQL Applier实现Hadoop数据复制的方法,该方法通过连接MySQL主服务器并读取已提交的二进制日志事件,然后将这些事件写入HDFS文件。数据库更改事件被描述为表创建操作或表数据更改。数据以逗号分隔或其他可配置格式写入Hive/HDFS。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

MySQL Applier for Hadoop

Replication via the Hadoop Applier is implemented by connecting to the MySQL master and reading binary log events as soon as they are committed, and writing them into a file in HDFS. "Events" describe database changes such as table creation operations or changes to table data.

MySQL to HDFS Integration

 

The Hadoop Applier uses an API provided by libhdfs, a C library to manipulate files in HDFS. The library comes precompiled with Hadoop distributions.

It connects to the MySQL master to read the binary log and then:

  • Fetches the row insert events occurring on the master
  • Decodes these events, extracts data inserted into each field of the row, and uses content handlers to get it in the format required
  • Appends it to a text file in HDFS.

Databases are mapped as separate directories, with their tables mapped as sub-directories with a Hive data warehouse directory. Data inserted into each table is written into text files (named as datafile1.txt) in Hive / HDFS. Data can be in comma separated format; or any other, that is configurable by command line arguments.

Mapping between MySQL and HDFS Schema

 

 

download  from http://labs.mysql.com/

 

 

 

 

 

 Preferences

http://dev.mysql.com/tech-resources/articles/mysql-hadoop-applier.html

http://www.tuicool.com/articles/NfArA3i

 

a similar project is  https://github.com/noplay/python-mysql-replication

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值