LAVA源码阅读笔记梳理

LAVA框架

LAVA是一个开源的硬件自动化测试工具, 它的框架如下:
lava 框架

LAVA分为master 和worker两个部分.
网页前端用Django框架开发而成, 用户可以通过网页查看设备类型,增加设备,提交任务等等, 网页端提交的数据会记录到后台的PostgreSQL服务器上去.
schduler任务调度器会周期性地扫描数据库中的数据,检查排队的测试任务,空闲可用的设备,并在资源可用时启动任务.
lava-master-daemon 会通过ZMQ(Zero MQ)机制与worker进行通信.
lava-slave-daemon 会接收master发送的控制消息并将log和任务结果通过ZMQ返回给master.
Dispatcher是LAVA系统特别重要的一个组件,它根据提交的任务定义和设备参数来执行对设备的所有操作.

在我的电脑上, LAVA启动之后,系统会启动下面这些相关的进程:

root@f4acd930a59a:/usr/bin# ps -efH | grep lava
UID        PID  PPID  C STIME TTY          TIME CMD
root      3641   727  0 03:05 pts/13   00:00:00   grep lava
root       220     1  0 Oct15 ?        00:00:03   /usr/bin/python3 /usr/bin/lava-slave --level DEBUG --master tcp://localhost:5556 --socket-addr tcp://localhost:5555
lavaser+   232     1  0 Oct15 ?        00:00:00   /usr/bin/python3 /usr/bin/lava-server manage lava-publisher --level DEBUG
root       233     1  0 Oct15 ?        00:00:05   gunicorn: master [lava_server.wsgi]
lavaser+   346   233  0 Oct15 ?        00:00:00     gunicorn: worker [lava_server.wsgi]
lavaser+   348   233  0 Oct15 ?        00:00:00     gunicorn: worker [lava_server.wsgi]
lavaser+   354   233  0 Oct15 ?        00:00:00     gunicorn: worker [lava_server.wsgi]
lavaser+   358   233  0 Oct15 ?        00:00:00     gunicorn: worker [lava_server.wsgi]
lavaser+   236     1  0 Oct15 ?        00:00:02   /usr/bin/python3 /usr/bin/lava-server manage lava-logs --level DEBUG
postgres   389   277  0 Oct15 ?        00:00:21     postgres: 10/main: lavaserver lavaserver 127.0.0.1(57668) idle
root       338     1  0 Oct15 ?        00:00:01   /usr/bin/python /usr/bin/lava-coordinator --loglevel=DEBUG
lavaser+   379     1  0 Oct15 ?        00:00:59   /usr/bin/python3 /usr/bin/lava-server manage lava-master --level DEBUG

lava-server 工作流程

可以看到上面的lava-server程序manage 参数之后接了三种参数: lava-publisher, lava-logs, lava-master, --level参数都是DEBUG.
lava_server程序非常简短, 它会调用一个main函数, 解析命令行参数, 选择合适的配置文件, 然后调用django库中的execute_from_command_line(argv=None)方法来执行对应的程序. 如下:

def main():
    # Is the script called from an installed packages or from a source install?
    installed = not sys.argv[0].endswith('manage.py')

    # Create the command line parser
    parser = argparse.ArgumentParser()
    if installed:
        subparser = parser.add_subparsers(title='subcommand', help='Manage LAVA')
        manage = subparser.add_parser("manage")
    else:
        manage = parser

    group = manage.add_argument_group("Server configuration")

    group.add_argument("-I", "--instance-template",
                       action="store",
                       default="/etc/lava-server/{filename}.conf",
                       help="Template used for constructing instance pathname."
                            " The default value is: %(default)s")

    manage.add_argument("command", nargs="...",
                        help="Invoke this Django management command")

    # Parse the command line
    options = parser.parse_args()

    # Choose the right Django settings
    if installed:
        settings = "lava_server.settings.distro"
    else:
        # Add the root dir to the python path
        find_sources()
        settings = "lava_server.settings.development"
    os.environ["DJANGO_SETTINGS_MODULE"] = settings
    os.environ["DJANGO_DEBIAN_SETTINGS_TEMPLATE"] = options.instance_template

    # Create and run the Django command line
    django_options = [sys.argv[0]]
    django_options.extend(options.command)
    execute_from_command_line(django_options)

以 /usr/bin/python3 /usr/bin/lava-server manage lava-master --level DEBUG 为例进行说明.
程序先检测第一个参数是不是manage.py, 如果是manage.py, 说明lava还没有安装好, 否则就说明已经安装好了.这里, 第一个参数是/usr/bin/lava-server ,所以installed的值为True.
所以, 程序会再增加一个subparser 用于解析manage 参数.
经过参数解析之后, 最后会执行函数:
execute_from_command_line(["/usr/bin/lava-server", “lava-master”, “–level DEBUG” ])
上面这个命令最终会定位到lava_server/management/commands/lava-master.py文件,并执行其中的程序.
lava-master.py中定义了很多的commands, 比如:

class Command(LAVADaemonCommand):
    def send_status(self, hostname):
        """
        The master crashed, send a STATUS message to get the current state of jobs
        """
        ...
	def start_job(self, job, options):
        # Load job definition to get the variables for template
        # rendering
        ...
    def start_jobs(self, options, jobs=None):
        """
        Loop on all scheduled jobs and send the START message to the slave.
        """        
        ...

lava_server/management/commands/目录下还有与另外另个进程对应的lava-logs.py和lava-publisher.py.
其中, lava-publisher主要是用ZMQ机制与slave传递lava event事件消息. lava-logs主要是通过ZMQ与slave传递log消息.

备忘:
启动Django服务程序的命令格式是:
python manage.py xx

lava-server的配置

lava-server的配置文件位于: lava_server/settings/, 包括下面这些配置文件:
common.py config_file.py development.py distro.py production.py secret_key.py
主要的配置信息位于common.py中,例如:

ROOT_URLCONF = 'lava_server.urls'
WSGI_APPLICATION = 'lava_server.wsgi.application'
STATIC_ROOT = "/usr/share/lava-server/static"

关于master 和 slave 具体是怎么通信的, 需要再深入看代码.
Django框架部分这里就不展开了.

lava-slave工作流程

下面分析lava-slave的工程流程, 它是job运行的起点.
以上面的进程为例:
/usr/bin/lava-slave --level DEBUG --master tcp://localhost:5556 --socket-addr tcp://localhost:5555

lava-slave-daemon 的主要工作流程如下(lava-slave.main()):

  1. 解析命令行参数
  2. 设置log等级
  3. 创建与master dispatcher连接的ZMQ上下文, 并返回必要的参数元组
  4. 连接数据库并创建表格jobs
  5. 配置ZMQ
  6. 连接到master
  7. 进入下面的循环:
    7.1 从master接收消息
    7.2 处理消息(重要的函数: handle())
    7.3 如有必要, 回复master ping消息
    7.4 检查job状态
    7.5 删除陈旧的资源

代码如下所示:

def main():
    # Parse command line
    options = setup_parser().parse_args()

    # Setup logger
    setup_logger(options.log_file, options.level)

    try:
    	# 创建与master dispatcher连接的ZMQ上下文, 并返回必要的参数元组
        ctx, sock, poller, pipe_r, pipe_w = create_context(options)
    except Exception as exc:
        return 1

    # slave states
    master = Master()
    mkdir(SLAVE_DIR)
    #连接数据库并创建表格jobs (/var/lib/lava/dispatcher/slave/db.sqlite3)
    jobs = JobsDB(os.path.join(SLAVE_DIR, "db.sqlite3"))
    
    if options.encrypt:
        zmq_config = ZMQConfig(options.socket_addr, options.master_cert,
                               options.slave_cert, options.ipv6)
    else:
        zmq_config = ZMQConfig(options.socket_addr, None, None, options.ipv6)

	### main loop
	try:
		#从master接收消息
        if not connect_to_master(poller, pipe_r, sock, options.master, options.ipv6):
            return 1
		master.received_msg()
		(leaving, msg) = recv_from_master("", poller, pipe_r, sock)
		#处理来自master的消息
        while not leaving:
            # If the message is not empty, handle it
            if msg is not None:
                handle(msg, master, jobs, zmq_config, sock)
            # Ping the master if needed
            master.ping(sock)
            # Regular checks
            last_jobs_check = check_job_status(jobs, sock, last_jobs_check)
            last_stale_check = remove_stale_resources(jobs, last_stale_check)
            # Listen to the master
            (leaving, msg) = recv_from_master("", poller, pipe_r, sock)
    except Exception as exc:
        return 1
    finally:
        destroy_context(ctx, sock, pipe_r, pipe_w)

    return 0

其中, handle的定义如下:

def handle(msg, master, jobs, zmq_config, sock):
    """
    Handle the master message

    :param msg: the master message (the header was removed)
    """
    # 1: identity and action
    try:
        action = u(msg[0])
    except (IndexError, TypeError):
        LOG.error("Invalid message from master: %s", msg)
        return

    # 2: handle the action
    if action == "CANCEL":
        handle_cancel(msg, jobs, sock, master)
    elif action == "END_OK":
        handle_end_ok(msg, jobs)
    elif action == "HELLO_OK":
        handle_hello_ok()
    elif action == "PONG":
        handle_pong(msg, master)
    elif action == "START":
        handle_start(msg, jobs, sock, master, zmq_config)
    elif action == "STATUS":
        handle_status(msg, jobs, sock, master)
    else:
        # Do not tag the master as alive as the message does not mean
        # anything.
        LOG.error("Unknown action: '%s', args=(%s)",
                  action, msg[1:])

对于 handle_start(msg, jobs, sock, master, zmq_config), 它的主要工作是启动这个job, 如果这个job已经被启动了,那么就返回这个job的状态信息给master. 主要代码如下:

def handle_start(msg, jobs, sock, master, zmq_config):
	...
        # Start the job, grab the pid and create it in the dabatase
        pid = start_job(job_id, job_definition, device_definition, zmq_config,
                        dispatcher_config, env, env_dut)
        #创建表格job_id
        job = jobs.create(job_id, 0 if pid is None else pid,
                          Job.FINISHED if pid is None else Job.RUNNING)
        #回复"START_OK"消息
        job.send_start_ok(sock)

    # Mark the master as alive
    master.received_msg()

启动一个job的时候会创建一个子进程执行lava-run程序.start_job定义如下:

def start_job(job_id, definition, device_definition, zmq_config,
              dispatcher_config, env_str, env_dut_str):
    ...
    #TMP_DIR 为 /var/lib/lava/dispatcher/slave/tmp/
    #base_dir 为 TMP_DIR/<job_id>
    # Write back the job, device and dispatcher configuration
    with open(os.path.join(base_dir, "job.yaml"), "w") as f_job:
        f_job.write(definition)
    with open(os.path.join(base_dir, "device.yaml"), "w") as f_device:
        f_device.write(device_definition)
    with open(os.path.join(base_dir, "dispatcher.yaml"), "w") as f_job:
        f_job.write(dispatcher_config)
    # Dump the environment variables in the tmp file.
    if env_dut_str:
        with open(os.path.join(base_dir, "env.dut.yaml", 'w')) as f_env:
            f_env.write(env_dut_str)
    try:
        out_file = os.path.join(base_dir, "stdout")
        err_file = os.path.join(base_dir, "stderr")
        env = create_environ(env_str)
        ##关键参数!!启动lava-run程序
        args = [
            "lava-run",
            "--device=%s" % os.path.join(base_dir, "device.yaml"),
            "--dispatcher=%s" % os.path.join(base_dir, "dispatcher.yaml"),
            "--output-dir=%s" % base_
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值