LAVA框架
LAVA是一个开源的硬件自动化测试工具, 它的框架如下:
LAVA分为master 和worker两个部分.
网页前端用Django框架开发而成, 用户可以通过网页查看设备类型,增加设备,提交任务等等, 网页端提交的数据会记录到后台的PostgreSQL服务器上去.
schduler任务调度器会周期性地扫描数据库中的数据,检查排队的测试任务,空闲可用的设备,并在资源可用时启动任务.
lava-master-daemon 会通过ZMQ(Zero MQ)机制与worker进行通信.
lava-slave-daemon 会接收master发送的控制消息并将log和任务结果通过ZMQ返回给master.
Dispatcher是LAVA系统特别重要的一个组件,它根据提交的任务定义和设备参数来执行对设备的所有操作.
在我的电脑上, LAVA启动之后,系统会启动下面这些相关的进程:
root@f4acd930a59a:/usr/bin# ps -efH | grep lava
UID PID PPID C STIME TTY TIME CMD
root 3641 727 0 03:05 pts/13 00:00:00 grep lava
root 220 1 0 Oct15 ? 00:00:03 /usr/bin/python3 /usr/bin/lava-slave --level DEBUG --master tcp://localhost:5556 --socket-addr tcp://localhost:5555
lavaser+ 232 1 0 Oct15 ? 00:00:00 /usr/bin/python3 /usr/bin/lava-server manage lava-publisher --level DEBUG
root 233 1 0 Oct15 ? 00:00:05 gunicorn: master [lava_server.wsgi]
lavaser+ 346 233 0 Oct15 ? 00:00:00 gunicorn: worker [lava_server.wsgi]
lavaser+ 348 233 0 Oct15 ? 00:00:00 gunicorn: worker [lava_server.wsgi]
lavaser+ 354 233 0 Oct15 ? 00:00:00 gunicorn: worker [lava_server.wsgi]
lavaser+ 358 233 0 Oct15 ? 00:00:00 gunicorn: worker [lava_server.wsgi]
lavaser+ 236 1 0 Oct15 ? 00:00:02 /usr/bin/python3 /usr/bin/lava-server manage lava-logs --level DEBUG
postgres 389 277 0 Oct15 ? 00:00:21 postgres: 10/main: lavaserver lavaserver 127.0.0.1(57668) idle
root 338 1 0 Oct15 ? 00:00:01 /usr/bin/python /usr/bin/lava-coordinator --loglevel=DEBUG
lavaser+ 379 1 0 Oct15 ? 00:00:59 /usr/bin/python3 /usr/bin/lava-server manage lava-master --level DEBUG
lava-server 工作流程
可以看到上面的lava-server程序manage 参数之后接了三种参数: lava-publisher, lava-logs, lava-master, --level参数都是DEBUG.
lava_server程序非常简短, 它会调用一个main函数, 解析命令行参数, 选择合适的配置文件, 然后调用django库中的execute_from_command_line(argv=None)方法来执行对应的程序. 如下:
def main():
# Is the script called from an installed packages or from a source install?
installed = not sys.argv[0].endswith('manage.py')
# Create the command line parser
parser = argparse.ArgumentParser()
if installed:
subparser = parser.add_subparsers(title='subcommand', help='Manage LAVA')
manage = subparser.add_parser("manage")
else:
manage = parser
group = manage.add_argument_group("Server configuration")
group.add_argument("-I", "--instance-template",
action="store",
default="/etc/lava-server/{filename}.conf",
help="Template used for constructing instance pathname."
" The default value is: %(default)s")
manage.add_argument("command", nargs="...",
help="Invoke this Django management command")
# Parse the command line
options = parser.parse_args()
# Choose the right Django settings
if installed:
settings = "lava_server.settings.distro"
else:
# Add the root dir to the python path
find_sources()
settings = "lava_server.settings.development"
os.environ["DJANGO_SETTINGS_MODULE"] = settings
os.environ["DJANGO_DEBIAN_SETTINGS_TEMPLATE"] = options.instance_template
# Create and run the Django command line
django_options = [sys.argv[0]]
django_options.extend(options.command)
execute_from_command_line(django_options)
以 /usr/bin/python3 /usr/bin/lava-server manage lava-master --level DEBUG 为例进行说明.
程序先检测第一个参数是不是manage.py, 如果是manage.py, 说明lava还没有安装好, 否则就说明已经安装好了.这里, 第一个参数是/usr/bin/lava-server ,所以installed的值为True.
所以, 程序会再增加一个subparser 用于解析manage 参数.
经过参数解析之后, 最后会执行函数:
execute_from_command_line(["/usr/bin/lava-server", “lava-master”, “–level DEBUG” ])
上面这个命令最终会定位到lava_server/management/commands/lava-master.py文件,并执行其中的程序.
lava-master.py中定义了很多的commands, 比如:
class Command(LAVADaemonCommand):
def send_status(self, hostname):
"""
The master crashed, send a STATUS message to get the current state of jobs
"""
...
def start_job(self, job, options):
# Load job definition to get the variables for template
# rendering
...
def start_jobs(self, options, jobs=None):
"""
Loop on all scheduled jobs and send the START message to the slave.
"""
...
lava_server/management/commands/目录下还有与另外另个进程对应的lava-logs.py和lava-publisher.py.
其中, lava-publisher主要是用ZMQ机制与slave传递lava event事件消息. lava-logs主要是通过ZMQ与slave传递log消息.
备忘:
启动Django服务程序的命令格式是:
python manage.py xx
lava-server的配置
lava-server的配置文件位于: lava_server/settings/, 包括下面这些配置文件:
common.py config_file.py development.py distro.py production.py secret_key.py
主要的配置信息位于common.py中,例如:
ROOT_URLCONF = 'lava_server.urls'
WSGI_APPLICATION = 'lava_server.wsgi.application'
STATIC_ROOT = "/usr/share/lava-server/static"
关于master 和 slave 具体是怎么通信的, 需要再深入看代码.
Django框架部分这里就不展开了.
lava-slave工作流程
下面分析lava-slave的工程流程, 它是job运行的起点.
以上面的进程为例:
/usr/bin/lava-slave --level DEBUG --master tcp://localhost:5556 --socket-addr tcp://localhost:5555
lava-slave-daemon 的主要工作流程如下(lava-slave.main()):
- 解析命令行参数
- 设置log等级
- 创建与master dispatcher连接的ZMQ上下文, 并返回必要的参数元组
- 连接数据库并创建表格jobs
- 配置ZMQ
- 连接到master
- 进入下面的循环:
7.1 从master接收消息
7.2 处理消息(重要的函数: handle())
7.3 如有必要, 回复master ping消息
7.4 检查job状态
7.5 删除陈旧的资源
代码如下所示:
def main():
# Parse command line
options = setup_parser().parse_args()
# Setup logger
setup_logger(options.log_file, options.level)
try:
# 创建与master dispatcher连接的ZMQ上下文, 并返回必要的参数元组
ctx, sock, poller, pipe_r, pipe_w = create_context(options)
except Exception as exc:
return 1
# slave states
master = Master()
mkdir(SLAVE_DIR)
#连接数据库并创建表格jobs (/var/lib/lava/dispatcher/slave/db.sqlite3)
jobs = JobsDB(os.path.join(SLAVE_DIR, "db.sqlite3"))
if options.encrypt:
zmq_config = ZMQConfig(options.socket_addr, options.master_cert,
options.slave_cert, options.ipv6)
else:
zmq_config = ZMQConfig(options.socket_addr, None, None, options.ipv6)
### main loop
try:
#从master接收消息
if not connect_to_master(poller, pipe_r, sock, options.master, options.ipv6):
return 1
master.received_msg()
(leaving, msg) = recv_from_master("", poller, pipe_r, sock)
#处理来自master的消息
while not leaving:
# If the message is not empty, handle it
if msg is not None:
handle(msg, master, jobs, zmq_config, sock)
# Ping the master if needed
master.ping(sock)
# Regular checks
last_jobs_check = check_job_status(jobs, sock, last_jobs_check)
last_stale_check = remove_stale_resources(jobs, last_stale_check)
# Listen to the master
(leaving, msg) = recv_from_master("", poller, pipe_r, sock)
except Exception as exc:
return 1
finally:
destroy_context(ctx, sock, pipe_r, pipe_w)
return 0
其中, handle的定义如下:
def handle(msg, master, jobs, zmq_config, sock):
"""
Handle the master message
:param msg: the master message (the header was removed)
"""
# 1: identity and action
try:
action = u(msg[0])
except (IndexError, TypeError):
LOG.error("Invalid message from master: %s", msg)
return
# 2: handle the action
if action == "CANCEL":
handle_cancel(msg, jobs, sock, master)
elif action == "END_OK":
handle_end_ok(msg, jobs)
elif action == "HELLO_OK":
handle_hello_ok()
elif action == "PONG":
handle_pong(msg, master)
elif action == "START":
handle_start(msg, jobs, sock, master, zmq_config)
elif action == "STATUS":
handle_status(msg, jobs, sock, master)
else:
# Do not tag the master as alive as the message does not mean
# anything.
LOG.error("Unknown action: '%s', args=(%s)",
action, msg[1:])
对于 handle_start(msg, jobs, sock, master, zmq_config), 它的主要工作是启动这个job, 如果这个job已经被启动了,那么就返回这个job的状态信息给master. 主要代码如下:
def handle_start(msg, jobs, sock, master, zmq_config):
...
# Start the job, grab the pid and create it in the dabatase
pid = start_job(job_id, job_definition, device_definition, zmq_config,
dispatcher_config, env, env_dut)
#创建表格job_id
job = jobs.create(job_id, 0 if pid is None else pid,
Job.FINISHED if pid is None else Job.RUNNING)
#回复"START_OK"消息
job.send_start_ok(sock)
# Mark the master as alive
master.received_msg()
启动一个job的时候会创建一个子进程执行lava-run程序.start_job定义如下:
def start_job(job_id, definition, device_definition, zmq_config,
dispatcher_config, env_str, env_dut_str):
...
#TMP_DIR 为 /var/lib/lava/dispatcher/slave/tmp/
#base_dir 为 TMP_DIR/<job_id>
# Write back the job, device and dispatcher configuration
with open(os.path.join(base_dir, "job.yaml"), "w") as f_job:
f_job.write(definition)
with open(os.path.join(base_dir, "device.yaml"), "w") as f_device:
f_device.write(device_definition)
with open(os.path.join(base_dir, "dispatcher.yaml"), "w") as f_job:
f_job.write(dispatcher_config)
# Dump the environment variables in the tmp file.
if env_dut_str:
with open(os.path.join(base_dir, "env.dut.yaml", 'w')) as f_env:
f_env.write(env_dut_str)
try:
out_file = os.path.join(base_dir, "stdout")
err_file = os.path.join(base_dir, "stderr")
env = create_environ(env_str)
##关键参数!!启动lava-run程序
args = [
"lava-run",
"--device=%s" % os.path.join(base_dir, "device.yaml"),
"--dispatcher=%s" % os.path.join(base_dir, "dispatcher.yaml"),
"--output-dir=%s" % base_