DATA-HUB 安装与启动：

原创已于 2025-01-14 19:10:10 修改

· 600 阅读

0 ·

版权

文章标签：

#python

于 2024-12-17 17:35:34 首次发布

一、docker 镜像安装(与本地项目无关)

datahub环境复杂，不用全部在本地安装

*注意指定python版本， docker有可能不兼容python3.10

/opt/homebrew/anaconda3/bin/python3.12 -m pip install --upgrade pip wheel setuptools
/opt/homebrew/anaconda3/bin/python3.12 -m pip install --upgrade acryl-datahub
/opt/homebrew/anaconda3/bin/python3.12 -m datahub version
/opt/homebrew/anaconda3/bin/python3.12 -m datahub docker quickstart
/opt/homebrew/anaconda3/bin/python3.12 -m datahub docker ingest-sample-data

一、本地安装：

1.2、主要参考：

Local Development | DataHub

1.1、前置依赖

Java 17 JDK
Python 3.10
Docker
Docker Compose >=2.20
Docker 引擎具有至少 8GB 内存来运行测试。
在 macOS 上，可以使用Homebrew安装这些。

# Install Java
brew install openjdk@17

# Install Python(特别重要， 版本不对，后面的构建编译环节会很多问题， 不通过，粗设置好PATH的python环境变量)
brew install python@3.10  # you may need to add this to your PATH
# alternatively, you can use pyenv to manage your python versions

# Install docker and docker compose
brew install --cask docker

1.2、重git上克隆代码

主分枝：

git clone https://github.com/datahub-project/datahub.git;

国际化的分支：

git clone https://github.com/luizhsalazar/datahub.git

执行结果：

(base) phoenix@phoenixdeMacBook-Pro ~ % git clone https://github.com/datahub-project/datahub.git;
正克隆到 'datahub'...
remote: Enumerating objects: 4477163, done.
remote: Counting objects: 100% (15782/15782), done.
remote: Compressing objects: 100% (1209/1209), done.
remote: Total 4477163 (delta 14145), reused 14814 (delta 13277), pack-reused 4461381 (from 2)
接收对象中: 100% (4477163/4477163), 6.39 GiB | 17.23 MiB/s, 完成.
处理 delta 中: 100% (2102360/2102360), 完成.
正在更新文件: 100% (8581/8581), 完成.

1.3、构建编译项目：

使用gradle wrapper构建整个项目：

切换到存储库的根目录：

请注意，上述操作还将运行测试和一些验证，这会使过程变得相当慢。

cd datahub

./gradlew build

建议根据您的需要部分编译DataHub：

构建Datahub的后端GMS（通用元数据服务）：
```
./gradlew :metadata-service:war:build
```

构建数据中心的前端：

./gradlew :datahub-frontend:dist -x yarnTest -x yarnLint

构建DataHub的命令行工具：

./gradlew :metadata-ingestion:installDev

构建DataHub的文档：

./gradlew :docs-website:yarnLintFix :docs-website:build -x :metadata-ingestion:runPreFlightScript
# To preview the documentation
./gradlew :docs-website:serve

这个教你怎么安装插件：

DataHub安装配置详细过程_datahub部署-优快云博客

2、安装python3.10作为datahub的python环境：

2.1、安装：

brew install python@3.10;(这个很重要，系统中有更高python版本，在./gradlew build时，会自引用python12/13, ) 参考$PATH环境变量设置。

2.2、设置mac默认的python环境

open -e ~/.bash_profile; 打开启动文件

open -e ~/.zshrc; 打开启动文件

brew list python@3.10; 查看3.10安装路径

把上面两个文件改成一样：


# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/opt/homebrew/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/opt/homebrew/anaconda3/etc/profile.d/conda.sh" ]; then
        . "/opt/homebrew/anaconda3/etc/profile.d/conda.sh"
    else
#        export PATH="/opt/homebrew/anaconda3/bin:$PATH"
	 export PATH="/opt/homebrew/bin:$PATH"

    fi
fi
unset __conda_setup
# <<< conda initialize <<<


#python3.12
alias python3='/opt/homebrew/Cellar/python@3.10/3.10.16/bin/python3.10' 
alias python=python3

1.x 清理 ./gradlew clear;

Starting a Gradle Daemon, 1 busy Daemon could not be reused, use --status for details
Configuration on demand is an incubating feature.
<-------------> 1% CONFIGURING [1m 43s]
<-------------> 1% CONFIGURING [2m 42s]
<-------------> 1% CONFIGURING [2m 49s]f classpath > gradle-node-plugin-7.0.2.pom
<-------------> 1% CONFIGURING [2m 50s]f classpath > gradle-nexus-staging-plugin-0.30.0.pom
> root project > Resolve dependencies of classpath > gradle-nexus-staging-plugin-0.30.0.pom


<-------------> 1% CONFIGURING [2m 51s]


> Configure project :datahub-frontend
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT

> Configure project :datahub-upgrade
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT

> Configure project :docker
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT

> Configure project :smoke-test
Root directory:  /Users/phoenix/datahub

> Configure project :docker:datahub-ingestion
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT

> Configure project :docker:datahub-ingestion-base
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT

> Configure project :docker:elasticsearch-setup
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT

> Configure project :docker:kafka-setup
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT

> Configure project :docker:mysql-setup
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT

> Configure project :docker:postgres-setup
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT

> Configure project :metadata-jobs:mae-consumer-job
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT

> Configure project :metadata-jobs:mce-consumer-job
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT

> Configure project :metadata-service:configuration
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT

> Configure project :metadata-service:war
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT
[Incubating] Problems report is available at: file:///Users/phoenix/datahub/build/reports/problems/problems-report.html

FAILURE: Build failed with an exception.

* What went wrong:
Task 'clear' not found in root project 'datahub' and its subprojects. Some candidates are: 'clean'.

* Try:
> Run gradlew tasks to get a list of available tasks.
> For more on name expansion, please refer to https://docs.gradle.org/8.11.1/userguide/command_line_interface.html#sec:name_abbreviation in the Gradle documentation.
> Run with --stacktrace option to get the stack trace.
> Run with --info or --debug option to get more log output.
> Run with --scan to get full insights.
> Get more help at https://help.gradle.org.

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.11.1/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

BUILD FAILED in 8m 4s
2 actionable tasks: 2 up-to-date
(base) phoenix@phoenixdeMacBook-Pro datahub %

、前端本地启动命令：（其他组建可以跑在docker上）

碰到问题：进入相应的虚拟环境：

python3 -m venv venv
source venv/bin/activate

python3 -m pip install --upgrade pip wheel setuptools

单独安装相应的包：python3 -m pip install pyarrow==11.0.0;

/opt/homebrew/opt/python@3.13/bin/python3.13 -m venv /Users/phoenix/datahub/metadata-ingestion/venv pip install pyarrow==11.0.0;

GIT国际化分支下载：

源码：

datahub/metadata-service at feature/ing-623 · datahub-project/datahub · GitHub

说明文档：

GitHub - luizhsalazar/datahub at feature/i18n-support

https://blog.datahubproject.io/how-we-implemented-internationalization-in-datahub-d3e9f6349a6a

本地启动frontend:

cd datahub-frontend/run && ./run-local-frontend

从源码中安装：

DataHub CLI | DataHub

二、插件安装

插件安装：

检查datahub插件：python3 -m datahub check plugins;

安装插件命令：python3 -m pip3 install 'acryl-datahub[postgres]'

（备注：本地环境安装， docker容器跑的时候也会安装，但会因为网络问题，容易超时错误，多跑几次，直到安装成功就好了）

/opt/homebrew/anaconda3/bin/python3.12 -m datahub check plugins;

/opt/homebrew/anaconda3/bin/python3.12 -m pip3 install 'acryl-datahub[postgres]';

/opt/homebrew/anaconda3/bin/python3.12 -m pip install 'acryl-datahub[postgres]';

参考：

Datahub部署 | Datahub中文社区

插件安装：参考

安装datahub - 编程好6博客