使用ETL工具Sqoop,将MySQL数据库db03中的10张表的表结构和数据导入(同步)到大数据平台的Hive中

在MySQL中创建了一个名为sqoop03的用户,设置了密码,并创建了数据库db03。然后通过sqoop03用户在db03数据库中创建了10张不同结构的表,并插入了数据。最后,使用Sqoop工具将这些表的结构和数据同步导入到了Hadoop集群的Hive中。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

  1. 在MySQL中,创建一个用户,用户名为sqoop03,密码为:123456

启动MySQL:support-files/mysql.server start

进入MySQL:mysql -u root -p

创建用户sqoop03:grant all on *.* to sqoop03@'%' identified by '123456';

(1)登陆MySQL的sqoop03用户,创建一张数据库,数据库的名称为db03登录到mysql的sqoop03用户

mysql -u sqoop03 -p

(2)创建数据库sqoop03

create database db03;

2.通过sqoop查看mysql的sqoop03 用户下有哪些数据库

命令:sqoop list-databases -connect jdbc:mysql://127.0.0.1:3306 -username sqoop03 -password 123456

3.在数据库db03中创建10张表,表名和字段可以自定义,每张表的数据不少于5行

(1)创建第一张表

mysql> use db03

Reading table information for completion of table and column names

You can turn off this feature to get a quicker startup with -A

Database changed

mysql> create table tb01(name char(100),id char(100));

Query OK, 0 rows affected (0.75 sec)

mysql> insert into tb01 values('zhh','100');

Query OK, 1 row affected (0.16 sec)

mysql> insert into tb01 values('za','101');

Query OK, 1 row affected (0.09 sec)

mysql> insert into tb01 values('zrr','102');

Query OK, 1 row affected (0.02 sec)

mysql> insert into tb01 values('gg','103');

Query OK, 1 row affected (0.01 sec)

mysql> insert into tb01 values('qq','104');

Query OK, 1 row affected (0.04 sec)

(2)创建第二张表

mysql> create table tb02(name char(100),sex char(30),age int);

Query OK, 0 rows affected (0.26 sec)

mysql> insert into tb02 values('qq','man','104');

Query OK, 1 row affected (0.12 sec)

mysql> insert into tb02 values('zz','man','13');

Query OK, 1 row affected (0.00 sec)

mysql> insert into tb02 values('cf','woman','23');

Query OK, 1 row affected (0.01 sec)

mysql> insert into tb02 values('gg','woman','20');

Query OK, 1 row affected (0.02 sec)

mysql> insert into tb02 values('gf','woman','26');

Query OK, 1 row affected (0.00 sec)

(3)创建第三张表

mysql> create table tb03(name char(100),heiht float);

Query OK, 0 rows affected (0.26 sec)

mysql> insert into tb03 values('gf',115);

Query OK, 1 row affected (0.26 sec)

mysql> insert into tb03 values('zz',165);

Query OK, 1 row affected (0.06 sec)

mysql> insert into tb03 values('jz',155);

Query OK, 1 row affected (0.01 sec)

mysql> insert into tb03 values('kk',150);

Query OK, 1 row affected (0.00 sec)

mysql> insert into tb03 values('ll',167);

Query OK, 1 row affected (0.01 sec)

(4)创建第四张表

mysql> create table tb04 (name char(100), tel char(30));

Query OK, 0 rows affected (0.11 sec)

mysql> insert into tb04 values('hh','110');

Query OK, 1 row affected (0.04 sec)

mysql> insert into tb04 values('zz','111');

Query OK, 1 row affected (0.01 sec)

mysql> insert into tb04 values('gg','112');

Query OK, 1 row affected (0.01 sec)

mysql> insert into tb04 values('jj','113');

Query OK, 1 row affected (0.01 sec)

mysql> insert into tb04 values('nm','114');

Query OK, 1 row affected (0.01 sec)

(5)创建第五张表

mysql> create table tb05 (name char(100),adress char(100));

Query OK, 0 rows affected (0.08 sec)

mysql> insert into tb05 values('hh','ChongQing');

Query OK, 1 row affected (0.04 sec)

mysql> insert into tb05 values(kk','guangdong');

mysql> insert into tb05 values('kk','guangdong');

Query OK, 1 row affected (0.00 sec)

mysql> insert into tb05 values('Jack','YunNan');

Query OK, 1 row affected (0.01 sec)

mysql> insert into tb05 values('Jf','YunNan');

Query OK, 1 row affected (0.00 sec)

mysql> insert into tb05 values('fg','HeNan');

Query OK, 1 row affected (0.00 sec)

(6)创建第六张表

mysql> create table tb06 (name char(100),love  char(50));

Query OK, 0 rows affected (0.14 sec)

mysql> insert into tb06 values('Marry','Sing');

Query OK, 1 row affected (0.00 sec)

mysql> insert into tb06 values('Mm','hhh');

Query OK, 1 row affected (0.06 sec)

mysql> insert into tb06 values('kk','hj');

Query OK, 1 row affected (0.01 sec)

mysql> insert into tb06 values('ss','hj');

Query OK, 1 row affected (0.00 sec)

mysql> insert into tb06 values('rts','dg');

Query OK, 1 row affected (0.01 sec)

(7)创建第七张表

mysql> create table tb07 (name char(100),academic  char(50));

Query OK, 0 rows affected (0.14 sec)

mysql> insert into tb07 values('Marry','Senior');

Query OK, 1 row affected (0.01 sec)

mysql> insert into tb07 values('Mm','Senior');

Query OK, 1 row affected (0.00 sec)

mysql> insert into tb07 values('oo','Ungergraduate');

Query OK, 1 row affected (0.01 sec)

mysql> insert into tb07 values('hh','Ungergraduate');

Query OK, 1 row affected (0.01 sec)

mysql> insert into tb07 values('iu','Ungergraduate');

Query OK, 1 row affected (0.00 sec)

(8)创建第八张表

mysql> create table table08 (name char(100),work  char(50));

ERROR 1050 (42S01): Table 'table08' already exists

mysql> create table tb08 (name char(100),work  char(50));

Query OK, 0 rows affected (0.14 sec)

mysql> insert into tb08 values('hh','Driver');

Query OK, 1 row affected (0.03 sec)

mysql> insert into tb08 values('ll','singer');

Query OK, 1 row affected (0.00 sec)

mysql> insert into tb08 values('tt','teacher');

Query OK, 1 row affected (0.01 sec)

mysql> insert into tb08 values('qq','teacher');

Query OK, 1 row affected (0.00 sec)

mysql> insert into tb08 values('yq','teacher');

Query OK, 1 row affected (0.00 sec)

(9)创建第九张表

mysql> create table tb09 (name char(100),salary int);

Query OK, 0 rows affected (0.23 sec)

mysql> insert into tb09 values('Ma',6000);

Query OK, 1 row affected (0.01 sec)

mysql> insert into tb09 values('hh',4000);

Query OK, 1 row affected (0.00 sec)

mysql> insert into tb09 values('gh',3000);

Query OK, 1 row affected (0.00 sec)

mysql> insert into tb09 values('yy',9000);

Query OK, 1 row affected (0.00 sec)

mysql> insert into tb09 values('oy',2000);

Query OK, 1 row affected (0.00 sec)

(10)创建第十张表

mysql> create table tb10 (name char(100),status char(30));

Query OK, 0 rows affected (0.11 sec)

mysql> insert into tb10 values('Ma','Single');

Query OK, 1 row affected (0.09 sec)

mysql> insert into tb10 values('hh','Single');

Query OK, 1 row affected (0.00 sec)

mysql> insert into tb10 values('ih','Married');

Query OK, 1 row affected (0.01 sec)

mysql> insert into tb10 values('pp','Married');

Query OK, 1 row affected (0.00 sec)

mysql> insert into tb10 values('lh','Married');

Query OK, 1 row affected (0.01 sec)

建立的10张表如下:

​​​​​​​​​​​​​​3、使用ETL工具Sqoop,将MySQL数据库db03中的10张表的表结构和数据导入(同步)到大数据平台的Hive中

(1)启动sqoop:

[root@HadoopMaster bin]# ./sqoop list-databases -connect jdbc:mysql://HadoopMaster:3306/ -username sqoop03 -password 123456

命令:

 sqoop import -connect jdbc:mysql://HadoopMaster:3306/db03 -username sqoop03 -password 123456 -table tb01 -hive-import -hive-table bigdata03.tb01 -m 1

显示成功:

​​​​​​​​​​​​​​

 (10个表依次输入命令即可。)

4、结果如下:

hive> show tables;

在9870端口查看如下:

### 级别数据血缘关系在Neo4j中的建模与写入 #### 1. 数据模型设计 为了有效地级别的数据血缘关系,在 Neo4j 中可以采用节点边相结合的方式进行建模。具体来说: - **节点(Node)**:代各个数据库,属性包括但不限于名、所属库名称等基本信息。 - **边(Relationship)**:用来描述不同格之间的依赖关系或转换操作,携带有关处理逻辑的信息。 这种结构能够清晰展示各之间是如何相互关联并影响彼此的数据流动情况[^2]。 ```cypher // 创建节点示例 CREATE (hive_table:HiveTable {name:'orders', database:'ecommerce'}) CREATE (spark_table:SparkTable {name:'order_items', database:'sales'}) // 建立两间的关系,假设订单明细由订单派生而来 MATCH (source:HiveTable{name:'orders'}), (target:SparkTable{name:'order_items'}) MERGE (source)-[:GENERATES]->(target) ``` #### 2. ETL过程集成 对于已经存在的历史血缘信息,可通过ETL工具定期抽取元数据变更记录,并将其转化为Cypher语句批量导入至图数据库内;而对于新增的任务,则可以在执行前动态生成相应的CQL命令完成即时同步更新[^1]。 ```bash # 使用sqoop或其他工具导出MySQL中所有表结构定义为JSON文件 $ sqoop export ... # Python脚本读取上述json并将之转成neo4j可接受格式 import json from py2neo import Graph graph = Graph("http://localhost:7474", auth=("username","password")) with open('tables.json') as f: tables = json.load(f) for table in tables: cypher_query = "MERGE (t:{engine} {{ name : '{table_name}', schema :'{schema}'}})".format( engine=table['type'], table_name=table['tableName'], schema=table.get('schema','default') ) graph.run(cypher_query) # 如果存在父级则创建关系 parent_tables = table.get('parentTables',[]) for p in parent_tables: rel_cypher="MATCH (p),(c) WHERE c.name='{child}' AND p.name='{parent}' MERGE (p)-[:FEEDS_INTO]->(c)" graph.run(rel_cypher.format(child=table['tableName'],parent=p)) ``` #### 3. 查询优化建议 考虑到大规模企业环境下可能涉及数万甚至数十万个对象实例及其复杂交错的关系网络,在实际部署过程中需特别注意索引设置以提高查询效率。针对频繁访问的关键路径上的标签(Label),应考虑为其添加唯一约束或者全文检索能力来加速特定场景下的遍历速度[^5]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

墨染盛夏呀

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值