Dockerize Apache HAWQ [作者:陶征霖]

本文介绍如何利用Docker构建Apache HAWQ容器,并详细解释了Docker的基本概念、安装过程、构建Apache HAWQ容器的步骤及运行方法。包括使用Dockerfile创建容器镜像、配置依赖软件、设置容器启动命令等关键步骤,以及如何运行和管理容器,实现Apache HAWQ在Docker环境中的部署。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >


Docker, a lightweight Linux Container engine, becomes more and more popular accompanied with the continuous development of big data and cloud computing. Essentially this is due to convenience involved by Docker in applications building, shipping and running. In this article, we will introduce docker and steps to dockerize Apache HAWQ.


Docker Introduction

The high level architecture of Docker is shown in Figure 1. Linux Container is a kind of lightweight operating system level virtualization method for running multiple isolated containers on a single host. It leverages namespace to do process isolation and cgroup to do resource isolation. And layered FS is used to be combined into a single docker image, which may contain user data and apps. This process just like put patch one by one on the base image to form a new one. With docker image, we can start docker container. It is worth while to note that docker container is not VM. The most important difference is that each VM needs a guest OS on host machine, which result in more cost of starting time and resource consumption than docker container. Now we already talked about docker image and docker container. Dockerfile is another basic element of docker. It contains instructions on command line to assemble a docker image, on purpose of automatically build and version control.


Figure 1: Docker high-level architecture


Docker Installation

We do docker installation on CentOS Linux release 7.0.1406 (Core), the kernel version is 3.10.0-123.el7.x86_64. First we do yum -y update to upgrade kernel and softwares if possible. Later we issue curl -sSL https://get.docker.com/ | sh to install docker engine. Ideally we could start docker via service docker start. However it reports error. After device-mapper-libs and device-mapper-event-libs are yum installed, docker is started up successfully. By default docker daemon process binds to Unix socket which belongs to root, so we have to sudo docker command. To avoid this, simply issue usermod -aG docker $USER. Docker is now ready to go.


Docker Image Build

In order to build Apache HAWQ docker image, we make use of Dockerfile. Keywords below are widly used in Dockerfile:

  • FROM: to specify base image from which you are building

  • RUN: to execute shell command

  • ENV: to set environment variable in docker container

  • ADD/COPY: usually to import files from host

  • EXPOSE: to open service port in docker container

  • CMD/ENTRYPOINT: to be executed when docker container started

Though both ENTRYPOINT and CMD allow you to specify the startup command for an image, there are subtle differences between them:

  1. CMD is overridden by the argument after the image name when starting the container, while ENTRYPOINT can only be overridden by the flag —entrypoint.

  2. Combining ENTRYPOINT and CMD, CMD strings will be appended to be the args of ENTRYPOINT.

  3. When using ENTRYPOINT and CMD, it's important to always use the exec form like ENTRYPOINT ["/bin/ping”,”localhost”], not the shell form ENTRYPOINT /bin/ping localhost.


Now we have some basic knowledge for Dockerfile, let's continue to build image for Apache HAWQ. We choose centos:7 in DockerHub as the base image. First we yum install softwares which is version compatible to Apache HAWQ, like jdk1.7, krb5, libxml2, libcurl, snappy, etc. For other libraries which are not version compatible or not found in yum repo, we install them from specific source, like json-c 0.9, flex 2.5.35, libhdfs3, libyarn, etc. Apache HAWQ development environment is settle down once all these dependencies are successfully installed. This is enough for devel mode. For production mode, we still need to add entrypoint part including Apache HAWQ building and running loggic. To build the image, we issue command docker build hawq:devel <path to Dockerfile>. One pre-built Apache HAWQ docker image has been pushed to DockerHub, you can refer to https://hub.docker.com/r/mayjojo/hawq-devel/. 


Docker Image Run

Issue command docker images to check that we already built hawq:devel image in local, we still need hadoop image. We can find it in DockerHub by docker search hadoop, and then docker pull <image>. To start HAWQ container, we use command docker run -d --name=hawq hawq:devel tail -f /dev/null. Issue command docker ps to check that one container named hawq is running in daemon. To login to the env, issue command docker exec -it hawq /bin/bash. Now you can build your HAWQ code, run HAWQ and do everything what you want. If you happen to break the envrionment, just docker kill hawq and rerun a new one. To achieve data persistent or share data between containers, you can simply mount data volumn from host by docker run -v or create a data container docker create -v /data --name=data and run HAWQ/Hadoop container docker run --volumes-from data. The latter is more recommended.


Docker version is still in quick iteration today. More and more exciting features to apply in Apache HAWQ are waiting for us to explore...



更多精彩内容,请关注大数据社区公共帐号!

长按识别图片二维码



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值