首次接触大数据项目,用到了kudu及hadoop,为防忘记故写篇笔记以作参考
1)pom依赖
<!-- https://mvnrepository.com/artifact/org.apache.kudu/kudu-client -->
<dependency>
<groupId>org.apache.kudu</groupId>
<artifactId>kudu-client</artifactId>
<version>1.5.0-cdh5.13.1</version>
<scope>test</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.kudu/kudu-client-tools -->
<dependency>
<groupId>org.apache.kudu</groupId>
<artifactId>kudu-client-tools</artifactId>
<version>1.5.0-cdh5.13.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.kudu/kudu-spark2 -->
<dependency>
<groupId>org.apache.kudu</groupId>
<artifactId>kudu-spark2_2.11</artifactId>
<version>1.6.0</version>
</dependency>
本文用的是cloudera版本,添加:
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
2)功能列表:
- 使用kuduClient创建表;
- 使用kuduClient添加数据;
- 使用kuduClient更新数据;
- 使用kuduClient查询数据;
- 使用kuduClient删除表;
- 使用sparksql查询数据;
- 使用spark—kuduContext判断表存在
ps:sparksql查询数据在cloudera官网只有scala版本。google也难找到java版的具体写法。查看源码