一、docker 镜像安装(与本地项目无关)
datahub环境复杂, 不用全部在本地安装
*注意指定python版本, docker有可能不兼容python3.10
/opt/homebrew/anaconda3/bin/python3.12 -m pip install --upgrade pip wheel setuptools
/opt/homebrew/anaconda3/bin/python3.12 -m pip install --upgrade acryl-datahub
/opt/homebrew/anaconda3/bin/python3.12 -m datahub version
/opt/homebrew/anaconda3/bin/python3.12 -m datahub docker quickstart
/opt/homebrew/anaconda3/bin/python3.12 -m datahub docker ingest-sample-data
一、本地安装:
1.2、主要参考:
1.1、前置依赖
- Java 17 JDK
- Python 3.10
- Docker
- Docker Compose >=2.20
- Docker 引擎具有至少 8GB 内存来运行测试。
- 在 macOS 上,可以使用Homebrew安装这些。
# Install Java
brew install openjdk@17
# Install Python(特别重要, 版本不对,后面的构建编译环节会很多问题, 不通过,粗设置好PATH的python环境变量)
brew install python@3.10 # you may need to add this to your PATH
# alternatively, you can use pyenv to manage your python versions
# Install docker and docker compose
brew install --cask docker
1.2、重git上克隆代码
主分枝:
git clone https://github.com/datahub-project/datahub.git;
国际化的分支:
git clone https://github.com/luizhsalazar/datahub.git
执行结果:
(base) phoenix@phoenixdeMacBook-Pro ~ % git clone https://github.com/datahub-project/datahub.git;
正克隆到 'datahub'...
remote: Enumerating objects: 4477163, done.
remote: Counting objects: 100% (15782/15782), done.
remote: Compressing objects: 100% (1209/1209), done.
remote: Total 4477163 (delta 14145), reused 14814 (delta 13277), pack-reused 4461381 (from 2)
接收对象中: 100% (4477163/4477163), 6.39 GiB | 17.23 MiB/s, 完成.
处理 delta 中: 100% (2102360/2102360), 完成.
正在更新文件: 100% (8581/8581), 完成.
1.3、构建编译项目:
使用gradle wrapper构建整个项目:
切换到存储库的根目录:
请注意,上述操作还将运行测试和一些验证,这会使过程变得相当慢。
cd datahub
./gradlew build
建议根据您的需要部分编译DataHub:
-
构建Datahub的后端GMS(通用元数据服务):
./gradlew :metadata-service:war:build
-
构建数据中心的前端:
./gradlew :datahub-frontend:dist -x yarnTest -x yarnLint
-
构建DataHub的命令行工具:
./gradlew :metadata-ingestion:installDev
-
构建DataHub的文档:
./gradlew :docs-website:yarnLintFix :docs-website:build -x :metadata-ingestion:runPreFlightScript # To preview the documentation ./gradlew :docs-website:serve
这个教你怎么安装插件:
DataHub安装配置详细过程_datahub部署-优快云博客
2、安装python3.10作为datahub的python环境:
2.1、安装:
brew install python@3.10;(这个很重要, 系统中有更高python版本, 在./gradlew build时, 会自引用python12/13, ) 参考$PATH环境变量设置。
2.2、设置mac默认的python环境
open -e ~/.bash_profile; 打开启动文件
open -e ~/.zshrc; 打开启动文件
brew list python@3.10; 查看3.10安装路径
把上面两个文件改成一样:
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/opt/homebrew/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/opt/homebrew/anaconda3/etc/profile.d/conda.sh" ]; then
. "/opt/homebrew/anaconda3/etc/profile.d/conda.sh"
else
# export PATH="/opt/homebrew/anaconda3/bin:$PATH"
export PATH="/opt/homebrew/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
#python3.12
alias python3='/opt/homebrew/Cellar/python@3.10/3.10.16/bin/python3.10'
alias python=python3
1.x 清理 ./gradlew clear;
Starting a Gradle Daemon, 1 busy Daemon could not be reused, use --status for details
Configuration on demand is an incubating feature.
<-------------> 1% CONFIGURING [1m 43s]
<-------------> 1% CONFIGURING [2m 42s]
<-------------> 1% CONFIGURING [2m 49s]f classpath > gradle-node-plugin-7.0.2.pom
<-------------> 1% CONFIGURING [2m 50s]f classpath > gradle-nexus-staging-plugin-0.30.0.pom
> root project > Resolve dependencies of classpath > gradle-nexus-staging-plugin-0.30.0.pom
<-------------> 1% CONFIGURING [2m 51s]
> Configure project :datahub-frontend
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT
> Configure project :datahub-upgrade
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT
> Configure project :docker
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT
> Configure project :smoke-test
Root directory: /Users/phoenix/datahub
> Configure project :docker:datahub-ingestion
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT
> Configure project :docker:datahub-ingestion-base
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT
> Configure project :docker:elasticsearch-setup
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT
> Configure project :docker:kafka-setup
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT
> Configure project :docker:mysql-setup
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT
> Configure project :docker:postgres-setup
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT
> Configure project :metadata-jobs:mae-consumer-job
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT
> Configure project :metadata-jobs:mce-consumer-job
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT
> Configure project :metadata-service:configuration
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT
> Configure project :metadata-service:war
fullVersion=v0.15.0rc3-62-g5946558
cliMajorVersion=0.15.0rc3
version=0.15.0rc3-SNAPSHOT
[Incubating] Problems report is available at: file:///Users/phoenix/datahub/build/reports/problems/problems-report.html
FAILURE: Build failed with an exception.
* What went wrong:
Task 'clear' not found in root project 'datahub' and its subprojects. Some candidates are: 'clean'.
* Try:
> Run gradlew tasks to get a list of available tasks.
> For more on name expansion, please refer to https://docs.gradle.org/8.11.1/userguide/command_line_interface.html#sec:name_abbreviation in the Gradle documentation.
> Run with --stacktrace option to get the stack trace.
> Run with --info or --debug option to get more log output.
> Run with --scan to get full insights.
> Get more help at https://help.gradle.org.
Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.
You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.
For more on this, please refer to https://docs.gradle.org/8.11.1/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.
BUILD FAILED in 8m 4s
2 actionable tasks: 2 up-to-date
(base) phoenix@phoenixdeMacBook-Pro datahub %
、前端本地启动命令:(其他组建可以跑在docker上)
碰到问题:进入相应的虚拟环境:
python3 -m venv venv
source venv/bin/activate
python3 -m pip install --upgrade pip wheel setuptools
单独安装相应的包:python3 -m pip install pyarrow==11.0.0;
/opt/homebrew/opt/python@3.13/bin/python3.13 -m venv /Users/phoenix/datahub/metadata-ingestion/venv pip install pyarrow==11.0.0;
GIT国际化分支下载:
源码:
datahub/metadata-service at feature/ing-623 · datahub-project/datahub · GitHub
说明文档:
GitHub - luizhsalazar/datahub at feature/i18n-support
https://blog.datahubproject.io/how-we-implemented-internationalization-in-datahub-d3e9f6349a6a
本地启动frontend:
cd datahub-frontend/run && ./run-local-frontend
从源码中安装:
二、插件安装
插件安装:
检查datahub插件:python3 -m datahub check plugins;
安装插件命令:python3 -m pip3 install 'acryl-datahub[postgres]'
(备注: 本地环境安装, docker容器跑的时候也会安装, 但会因为网络问题, 容易超时错误,多跑几次, 直到安装成功就好了)
/opt/homebrew/anaconda3/bin/python3.12 -m datahub check plugins;
/opt/homebrew/anaconda3/bin/python3.12 -m pip3 install 'acryl-datahub[postgres]';
/opt/homebrew/anaconda3/bin/python3.12 -m pip install 'acryl-datahub[postgres]';
参考:
插件安装:参考