【记录】数据治理工具Apache Atlas 编译集成CDH并部署使用

最新推荐文章于 2024-07-21 10:24:42 发布

槐序i

最新推荐文章于 2024-07-21 10:24:42 发布

阅读量823

点赞数

分类专栏：数据治理文章标签：大数据

本文链接：https://blog.youkuaiyun.com/spark9527/article/details/111604708

版权

本文档详细记录了如何在CDH环境中编译和部署Apache Atlas，用于大数据治理。从软件版本选择、源码下载、解决编译错误到安装部署的全过程，包括面对的网络问题及解决方案，旨在确保在数据表变更时，能够准确追踪受影响的任务。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

写在前面：

随着大数据业务越来越复杂，有数仓中大量的细分的历史表，各类用户状态表，以及数据集市之后大量的统计表。当业务有所改变，或者埋点规则改变或者修复之前已存在的bug时。一个小小的改动，可能会对指标统计造成很多改变，但是在开发过程中，如果任务过多，时常会有忘记修改此处字段，究竟有多少脚本需要改变的情况。因此在各大论坛寻求解决方案，最终确定了使用Apache Atlas来进行数据治理，为了实现当修改某张表的某个字段时，会对哪些任务造成影响。

一、软件版本

Apache Atlas 1.1.0
CDH 5.12.1
hive 1.1.0

二、下载

综合大量相关资料，目前没有找到特别好的编译好的包。大部分都是下载官网源码，自己编译部署的，执行以下命令，下载安装包

wget https://archive.apache.org/dist/atlas/1.1.0/apache-atlas-1.1.0-sources.tar.gz

然后解压

tar -zxvf  apache-atlas-1.1.0-sources.tar.gz 
cd apache-atlas-sources-1.1.0

三、编译

export MAVEN_OPTS="-Xms2g -Xmx2g"

如果你的机器里头有 solr和hbase，用以下命令

mvn clean -DskipTests package -Pdist

如果没有则创建包含Apache HBase和Apache Solr的Apache Atlas软件包

mvn clean -DskipTests package -Pdist,embedded-hbase-solr

由于官方上说的1.1.0版本使用的5.5.1的solr，因此当前版本不满足使用，这里采用的第二种编译方法

Configuring Apache Solr as the indexing backend for the Graph Repository

By default, Apache Atlas uses JanusGraph as the graph repository and is the only graph repository implementation available currently. For configuring JanusGraph to work with Apache Solr, please follow the instructions below

Install Apache Solr if not already running. The version of Apache Solr supported is 5.5.1. Could be installed from http://archive.apache.org/dist/lucene/solr/5.5.1/solr-5.5.1.tgz
Start Apache Solr in cloud mode.

在编译过程中报错1

Failed to execute goal on project hive-bridge-shim: Could

最低0.47元/天解锁文章