This is my workpaper of docker course.
It acheive 3 tasks:
1.Building a cluster manager which can control multiple containers at once. (How to use python control the docker)
2.Use multiple containers to run a program which calculating 100000 numbers (How to use docker volume)
3.Use containers to run a ResNet(A AI TASK)
All code can be found in APPENDIX.
0.Preparation
Docker must run with linux, so windows users have to download wsl2 to run linux in windows instead of installing virtual machine.Detail technical document can be found in安装 WSL | Microsoft Learn.
In fact it's very easy, you only have to open PowerShell and run command:
wsl --install
Next , download docker from www.docker.com (make sure that you'v open Hyper-V in settings and Virtualization technical in BIOS).
If everything goes fine, opening PowerShell and type:
docker version
You will see the detail information of your docker desktop.
ATTENTION:If you want to command docker in PowerShell, MAKE SURE your docker destop is running!!!(In fact, I've frequently encountered bugs similar to this one.)
However, docker image has to be pulled from dockerhub which is overseas, so we have to set the docker engine to make sure that it pull the images from domestic image website.
Open the docker engine in docker desktop's settings, and add the command, then apply.
"registry-mirrors": [
"https://docker.1ms.run",
"https://docker.xuanyuan.me"
]
The website always break however which means you have to change the mirrors frequently.
1.Background
1.1 Containerization and Virtual Machine
In the century of Internet, computing ability is much important than before and easily lead to insufficient. There are two ways to solve the problem, horizontal scaling and vertical scaling. Horizontal scaling means using a faster, stronger machine. Vertical scaling means using multiple machine and combine them to get more computing ability. However, horizontal has a avoidless disadvantage, cost. Fig.1 shows the rough relationship between Cost and Machine Size.
Fig.1 Relationship between Cost and Machin Size
Vertical computing can be more flexible. When more computing is needed, cloud computing server can provided stronger computing ability though internet. So that it avoid unnecessary payment for maintaining machine size.
In order to achieve this, we have to cut cloud computing into pieces. One way that commonly used is Virtual Machine. Fig.2 shows the difference about structure between Virtual Machine and general computer.
Fig.2 Different structure of Virtual Machine(on the left)
and general computer(on the right)
It’s not hard to figure out the defect:
- Too much overhead, not enough light weight
- The initialization time is always too long.
- Redundancy of the operating system needs to be configured
To avoid these detrimental points, containerization was mentioned and docker was developed to replace Virtual Machine. Compared to Virtual Machine, it’s more lightweight, more flexible. But it also lead to security question, different container will affect each other.
Fig.3 The structure of container
1.2 The cloud server model
With the development of cloud computing, several cloud server models were mentioned to satisfy different requirement. They provide different layers in computer structures, which is also the way to distinguish them.
1.Infrastructure as a Service(IaaS):
IaaS only provide the base of computer: Physical Resource and the Virtual Resource(Fig.4), which means the user can, have to setting everything of the cloud computer by themselves. Except the basal facility, the user has most control power. This result in more flexible, but also more requirement for user’s computer ability.
Fig.4 The diagrammatize of IaaS
2.Platform as a Server(PaaS):
Different from IaaS, PaaS also provide the operating system and middleware. (Fig.5)This means the user don’t need to care about how the virtual machine works, but just focus on their software they use. So it’s more friendly to the users who only control the applications and data. However, it’s mobility is poor which will easily lead to cloud locking. If the user use the applications in the specific virtual computer environment that cloud server provides, the software will rely on it and result in using the applications in other virtual computer environment much difficult.
Fig.5 The diagrammatize of PaaS
3.Software as a Server(SaaS):
In SaaS, the user only need to buy the software they needed. Cloud server will provided physical resource, virtual resource, operating system, middleware, basal applications.(Fig.6) This is the most convenient way but also the most unflexible way. Once the software break or want to change the cloud server, the migration of business operation will be complex.
Fig.6 The diagrammatize of SaaS
1.3 Brief introduction to Docker
As a platform which can maintains the containers, Docker has three important part: client, docker_host, registry.Fig.7 shows the structure of docker. Users can control the container though client’s command, then docker daemon will receive the command and make corresponding actions. Create a container requires images which recorded in registry. Once the images download ,they will be recorded locally for next requirement.(Can be delete by command). Different containers are in different namespace which means they won’t affect each other. However they can share their data though docker volume which can records data even the container is stopped or deleted.
Fig.7 The structure of Docker
2.Create Cluster Manager though Python(Assignment #1)
2.0 Prepare
In order to control containers though python, python library Docker should be installed and imported correctly, then use command: docker.from_env() to get a client object which linked to the docker daemon. So that multiple commands can be used in python to control the docker_host.
Fig.8 The method to connect local docker client.
2.1 Action: Create Cluster
Create a container required following facilities: images(necessary), name (unnecessary), the quantity of containers to be created(unnecessary) , volume (unnecessary). Therefore, the function that creates the containers should have 4 input variables, with 3 of them having default values.(Fig.9)
Fig.9 The definition of creating containers
Variables introduction:
im_name(str): The name of image
name(list or str): The name of containers. The member of list should be str
num_create(int): The quantity of containers to be created
volume(dict): The name of volume and the path that container link to volume
Fig.10 shows the process of the function.
Fig.10 the process of the function
It is important to make sure the image is exist, otherwise a docker.errors will be raised. A function is created to check whether an image is legal or not. (Fig.11) It first attempts to find the image locally; if it is not found, it tries to pull the image from an online repository; then return the validity of image.
Fig.11 Code for function to check the validity of image
While creating the containers, docker will automatically name it if the variable of name is none. If the name already in use, docker.errors will be appeared. So checking the validity of name is also important.
Fig.12&13 Code for function to check the validity of name
The third step is to make a direction which used to link a volume. A new assisted function is defined to make sure the volume exist. The Docker SDK require that the input of variable volumes is a direction which satisfy a specific format, several steps are taken to build it.
Fig.14&15 Code for function to check the validity of name
Finally, after sufficient inspection, every variable is legal to create container. For loop can be used to create multiple containers in once.(Total code can be found in appendix)
Fig.16 Code for create containers
2.2 Action: List Cluster
Docker SDK provide a specific function which can return all containers in a list which is a good way to list cluster. Fig.17 shows how to list all running containers and Fig.18 shows how to list all existed containers. Also each container can be given an index, so that when deleting a specified container, only its index needs to be provided but not a difficult name.
Fig.17 Code for list all containers
Fig.18 Code for list running containers
2.3 Action: Run a simple command in the cluster
Docker SDK provide a function exec_run to run command in container which will return the result of the command. So it’s easy to control the container thought the function.(Fig.19) The function will first find the container by function check_container by name(a string) or index(a number), then use exec_run to run command in the container.
Fig.19 Code for running command in container
2.4 Action: Stop the Cluster
From function check_image, any container can be found by name or index, so same as 2.3, the function which stopping containers first identifies which container is to be closed, then use function stop it. Fig.20 shows how to stop a specific container and Fig.21 shows how to stop all running containers.
Fig.20&21 Code for stopping container
2.5 Action: Delete Containers
Docker SDK provide a function prune() which can delete all stopped containers at once, so the containers should be closed before deleted. The process should be: Use function defined in 2.4 to stop the containers, Delete the specific container or all containers.(Fig.22) Fig.23 shows the detail code of the function.
Fig.22 The process of delete container
Fig.23 Code for delete container
3.Data Processing in Cluster Manager(Assignment #2)
3.1 Problem analysis
In order to make sure 4 containers can read the data, they should link to a volume which records the data. However, the container cannot run Python programs due to the absence of a Python environment. This means a Dockerfile needs to be created to build a custom image based on Ubuntu that can execute Python programs. Given that each container runs the same program to process different data, the program can be stored in a volume and executed accordingly. An environment variable can be assigned to each container to distinguish them from one another. Each container will generate its response and store it within a shared volume. Subsequently, a designated container will read these responses and aggregate them to produce the final result. Detailed process path is shown in Fig.24.
Fig.24 Process of calculating
3.2 Detailed Process of Building the Cluster Manager
3.2.1 Use dockerfile to create the image
Create a text document named Dockerfile (without any file extension) and use the FROM and RUN commands within it to build the environment needed for the image (Fig.25).
Fig.25 Code in Dockerfile
Then, in PowerShell on Windows, utilize the docker build command to create the image(Fig.26). It will read the Dockerfile and execute the commands within it to generate the image. In the command, the name after the tag is the name of image going to be created.
Fig.26 Code to create image
3.2.2 Create the containers which link to the volume
The function in 2.1 can be used to create containers which link to a specific volume.
Fig.27 Code to create containers
In this scenario, four containers named Container1, Container2, Container3, and Container4 will be instantiated from the my-ubuntu image and connected to a volume called my_volume. The data stored in this volume can be accessed via the path /volume_Data. We can use command ls in the container to check if the volume connected or not.(Fig.28)
Fig.28 Check the volume whether linked to container or not
3.2.3 Programming the python program to calculate the answer
Each container has to read different data which depend on the environment variable TASK_ID they receive. So in the program sum.py, a function is defined first in order to achieve the goal.(Fig.29)
Fig.29 Read the data for calculating
Then in the main function, the program will receive the environment variable TASK_ID to make sure the data they have to operate, then use the function shown in Fig.29 to read the data. After receive the data, program will calculating the answer from it and output to the file named output followed by a number of TASK_ID they received. Detailed code is shown in Fig.30.
Also we have to program get_ans.py to integrate each output to calculate the final answer. Detail code can be found in appendix.
3.2.4 Upload the data and calculating program
Next, the data and calculating program should be update to the volume to make sure every container can visit. Command docker cp can be used to achieve the goal. In windows shell, use command docker cp numbers 964:/volume_Data to copy the file numbers to the container’s document volume_Data whose id is 964a483af30c. (Fig.31) In the command, you can provide just a portion of the container ID, instead of having to input the whole ID. Then the file will copy to the volume though a container.
In the same way, The python program which calculate the answer can copy to the volume.
Fig.30 Code to calculating the answer
Fig.31 Command of copy file to volume
3.2.5 Running the python program to get the answer.
Everything are prepared, after running the python program in different container with environment variable TASK_ID, the final answer will be export to the file ans. Fig.32 shows the code to achieve it. However, there is one thing has to be careful: the variable environment that function exec_run receive must be a direction, so before starting, a correct direction must be created.
Fig.32 Command of running the python program to get the answer
4.AI task in Container(Assignment #3)
4.1 Background
In recent year, the technology of image recognition has developed rapidly. Some CNN structure were widely used such as ResNet. In this assignment, a ResNet were built as python program then copy to the container to operate. The dataset the CNN used is fashion_mnist.
4.2 ResNet Building
The ResNet use residual block instead of normal convolution block. The different between them is that the output of block Y will add X in residual block.(Fig.33)
Fig.33 Residual Block
So in the end of each residual block, the output Y have to add input X instead of immediately translate to the next block. The structure of function that extract the features is the same as normal CNN nets. Detail code of residual block is shown in Fig. 34(Total code can be found in appendix)
Fig.34 Code of residual block
4.3 Running in Container
We can copy the program into the volume, then running it in the cluster manager we built in Chapter 3. The command of copy and operate program is shown before. After the program finish, it will print the loss and accuracy of the net.(Fig.35)
Fig.35 The loss and accuracy of the net
APPENDIX
A. The code of Assignment #1: Create a cluster manager
#!/usr/bin/env python
# coding: utf-8
import docker
client=docker.from_env()
def create_cner(im_name,name=None,num_create=1,volume=None):
#check if the image exist
if not check_image(im_name):
return
ck=False
for _ in name:
if check_container(_) is not None:
print(f'Already exist container {_}!Please use another name!')
ck=True
if ck==True:
return
# 构建 volumes 参数
if volume is not None:
try:
volume_name=volume["volume_name"]
volume_path=volume["volume_path"]
except:
print("The format of volume: {'volume_name': 'The name of volume', 'volume_path': 'The path of volume'}")
return
get_volume(volume_name)
volumes = {volume_name: {"bind": volume_path, "mode": "rw"}}
if len(name)>num_create:
name=list(name)
name=name[:num_create]
if name is not None:
for x in name:
client.containers.create(image=im_name,name=x,tty=True,command="sh",volumes=volumes)
for _ in range(num_create-len(name)):
client.containers.create(image=im_name,command="sh",tty=True,volumes=volumes)
else:
for _ in range(num_create):
client.containers.create(image=im_name,command="sh",tty=True,volumes=volumes)
else:
if name is not None:
for x in name:
client.containers.create(image=im_name,name=x,tty=True,command="sh")
for _ in range(num_create-len(name)):
client.containers.create(image=im_name,tty=True,command="sh")
else:
for _ in range(num_create):
client.containers.create(image=im_name,tty=True,command="sh")
def check_image(name):
try:
#get image
client.images.get(name)
return True
except:
#if can't,pull the image
print("Image ",name," isn't exist locally.Pulling from hub!")
try:
client.images.pull(name)
return True
except:
print("Image ",name," isn't exist!Please check the name of image!")
return False
def check_container(name):
if isinstance(name,int):
cn=client.containers.list(all=True)
if name>=len(cn):
return None
else:
return cn[name]
else:
try:
c1=client.containers.get(name)
return c1
except:
return None
def list_all_container():
cn=client.containers.list(all=True)
for index,x in enumerate(cn):
print(f'No.{index} Container:',x.name)
print("Container id:",x.id)
print("Container status:",x.status)
return cn
def list_running_container():
containers=client.containers.list(all=True)
for index,x in enumerate(containers):
if x.status == 'running':
print(f'No.{index} Container:',x.name)
print("Container id:",x.id)
print("Container status:",x.status)
return cn
def delete_container(name):
cn=check_container(name)
if(cn==None):
return
cn.stop()
cn.remove()
print(f'Successful delete!')
def delete_all_container():
cn=client.containers.list(all=True)
for x in cn:
x.stop()
client.containers.prune()
print("Successful delete all containers")
def delete_running_container():
cn=client.containers.list()
for x in cn:
print(x.name,"is stopping")
x.stop()
x.remove()
print(x.name,"successful delete")
print("Successful delete running containers")
def get_volume(name):
try:return client.volumes.get(name)
except:
return client.volumes.create(name=name)
def stop_container(name):
c1=check_container(name)
if(c1==None):
return
c1.stop()
def stop_all_container():
cn=client.containers.list()
for x in cn:
x.stop()
def exec_container(name,command=None):
c1=check_container(name)
if(c1==None):
return
c1.exec_run(cmd=command)
def start_all_container():
cn=client.containers.list(all=True)
for x in cn:
x.start()
B. The code of Assignment #2: Data Processing in Cluster Manager
Attention: It has to operate with code in appendix A
1.Code in cluster manager
volume={"volume_name":"my_volume",
"volume_path":"/volume_Data"}
names={"Container1","Container2","Container3","Container4"}
create_cner(im_name='my-ubuntu',name=names,num_create=4,volume=volume)
start_all_container()
containers=client.containers.list()
for index,x in enumerate(containers):
environment = {'TASK_ID':str(index)}
output=x.exec_run(cmd=f'python3 volume_Data/sum.py',environment=environment)
print(output)
ans=containers[0].exec_run(cmd=f'python3 volume_Data/get_ans.py')
print(ans)
2.Code of sum.py
import os
import csv
def read_csv_part(file_path, start_line, end_line):
data = []
with open(file_path, 'r') as f:
reader = csv.reader(f)
for _ in range(start_line):
next(reader)
for _ in range(end_line - start_line):
row = next(reader)
data.append(float(row[0]))
return data
def main():
task_id = int(os.getenv('TASK_ID', 0))
total_lines = 100000
partition_size = total_lines // 4
start_line = task_id * partition_size
end_line = start_line + partition_size
if task_id == 3:
end_line = total_lines
file_path = 'volume_Data/numbers'
data = read_csv_part(file_path, start_line, end_line)
print(f'There are {len(data)} numbers')
sum_data = sum(data)
average = sum_data / len(data)
variance = sum((x - average) ** 2 for x in data) / len(data)
max_value = max(data)
min_value = min(data)
with open(f'volume_Data/output{task_id}', 'w') as file:
file.write(f'{sum_data}\n')
file.write(f'{average}\n')
file.write(f'{variance}\n')
file.write(f'{max_value}\n')
file.write(f'{min_value}\n')
if __name__ == "__main__":
main()
3.Code of get_ans.py
files = ["output0", "output1", "output2", "output3"]
row_data = []
for file_name in files:
with open(f'volume_Data/{file_name}', "r") as file:
current_rows = []
for line in file:
num = float(line.strip())
current_rows.append(num)
row_data.append(current_rows)
sum_first_row = sum(row[0] for row in row_data)
avg_second_row = sum(row[1] for row in row_data) / len(row_data)
mean_third_row = sum(row[2] for row in row_data) / len(row_data)
var_third_row = sum((x - mean_third_row) ** 2 for x in (row[2] for row in row_data)) / (len(row_data) - 1)
max_fourth_row = max(row[3] for row in row_data)
min_fifth_row = min(row[4] for row in row_data)
with open('volume_Data/ans',"w") as f:
f.write(f'Sum:{sum_first_row}\n')
f.write(f'Average:{avg_second_row}\n')
f.write(f'Variance:{var_third_row}\n')
f.write(f'Max:{max_fourth_row}\n')
f.write(f'Min:{min_fifth_row}\n')
print(f'The Sum is {sum_first_row}')
print(f'The Average is {avg_second_row}')
print(f'The variance is {var_third_row}')
print(f'The max is {max_fourth_row}')
print(f'The min is {min_fifth_row}')
C. The code of Assignment #3: AI task in Cluster Manager
import torch
from torch import nn
from torch.nn import functional as F
from d2l import torch as d2l
class Residual(nn.Module):
def __init__(self,input_channels,output_channels,use_1x1conv=False,strides=1):
super().__init__()
self.conv1=nn.Conv2d(input_channels,output_channels,kernel_size=3,padding=1,stride=strides)
self.conv2=nn.Conv2d(output_channels,output_channels,kernel_size=3,padding=1)
if use_1x1conv:
self.conv3=nn.Conv2d(input_channels,output_channels,kernel_size=1,stride=strides)
else:
self.conv3=None
self.bn1=nn.BatchNorm2d(output_channels)
self.bn2=nn.BatchNorm2d(output_channels)
def forward(self,X):
Y=F.relu(self.bn1(self.conv1(X)))
Y=self.bn2(self.conv2(Y))
if self.conv3:
X=self.conv3(X)
Y+=X
return F.relu(Y)
b1=nn.Sequential(nn.Conv2d(1,64,kernel_size=7,stride=2,padding=3),
nn.BatchNorm2d(64),nn.ReLU(),
nn.MaxPool2d(kernel_size=3,stride=2,padding=1))
def resnet_block(input_channels,output_channels,num_residuals,first_block=False):
blk=[]
for i in range(num_residuals):
if i==0 and not first_block:
blk.append(Residual(input_channels,output_channels,use_1x1conv=True,strides=2))
else:
blk.append(Residual(output_channels,output_channels))
return blk
b2=nn.Sequential(*resnet_block(64,64,4,first_block=True))
b3=nn.Sequential(*resnet_block(64,128,4))
b4=nn.Sequential(*resnet_block(128,256,8))
b5=nn.Sequential(*resnet_block(256,512,4))
b6=nn.Sequential(*resnet_block(512,1024,4))
net=nn.Sequential(b1,b2,b3,b4,b5,b6,nn.AdaptiveAvgPool2d((1,1)),nn.Flatten(),nn.Linear(1024,10))
lr,num_epochs,batch_size=0.05,10,256
train_iter,test_iter=d2l.load_data_fashion_mnist(batch_size,resize=96)
d2l.train_ch6(net,train_iter,test_iter,num_epochs,lr,d2l.try_gpu())