Bazel：高效精准的构建工具-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_39503148/article/details/131320299

Bazel

基本知识

优势：精准快速构建

增量构建，输入、输出、环境等变化都会重新构建。
管理依赖，分析阶段，发现循环依赖，立即拒绝执行。

概念

Workspace：工作空间即目录，工作空间下有需建WORKSPACE文件。

WORKSPACE：定义工作空间，声明外部依赖。

Package：代码组织单元，由BUILD标识。

Target：定义在BUILD中，包含规则rule、文件file、包组package_group。

Label: 标签的规范化表示： @project//my/app/main:app_binary（包名+目标名）

Rule：指定输入输出之间的关系。

BUILD：描述工程如何构建。

.bzl：可定义宏、规则、常量等，加载阶段被执行。通过load()导入BUILD。

名称	解释
WORKSPACE	每一个工程都需要定义的一个文件，位于工程的根目录下。可以是空文件，也可以加载一些外部依赖。
action	rule中定义的构建动作。全部在运行阶段执行。
BUILD	存在每个小代码仓库中，定义当前仓库中的构建要素。输入、输出和构建行为。是最小的构建单位。
bzl	自定义的规则后缀。
external rule	bazel约定的其他规则库，用skylark语言（Python的子集）编写。
rule	bazel的构建规则，位于BUILD文件中。每个规则中包含输入和输出，以及构建动作。目前已有C/C++、Java、Golang、Python等成熟的构建规则库，可以方便地从github上获取。见官方文档已有规则库
package	在BUILD文件中定义的一系统目标。包具有可见性属性，可用来控制对外暴露的属性。
build graph	构建依赖图，就是前文提到的依赖关系图。由定义在各个BUILD文件中的目标构成。

构建

构建命令

bazel build //main:hello-world

构建结果查看（符号链接，实际存放在.cache）

bazel-bin/main/hello-world

在这里插入图片描述

运行阶段

加载阶段，Bazel加载并执行项目中所有的.bzl和所有的BUILD。规则进行实例化，并加入目标图中。

分析阶段，会执行定义规则的implementation函数，实例化actions，将加载阶段的目标图转化为构建图。

执行阶段，根据分析阶段生成的依赖关系，调用编译器、shell命令等，执行构建图中的actions。

原生规则

cc_binary二进制目标

cc_binary(
    name = "hello-world",#目标名
    srcs = ["hello-world.cc"], #编译器选项
    copts = ['-Iexternal/eigen3'
    ],
    deps = [ #依赖的库
        "//greet:hello-greet",
        "//time:hello-time",
        "@eigen3//:eigen"
    ],
)

cc_library

hdrs声明的头文件：可被当前库hdrs/srcs中的文件直接包含，也可被依赖当前库的其它库*的hdrs/srcs直接包含。

srcs声明的头文件：仅能被当前库的hdrs/srcs包含。

cc_library(
	name = "hello-greet",
	srcs = ["hello-greet.cc"],
	hdrs = ["hello-greet.h"],
	linkstatic = False,#允许动态链接，同时生成.a、.so
	visibility = ["//main:__pkg__"], #包与包之间不可见，使用visibility增加可见性。
	include_prefix = 'exp1/lib11/',
	strip_include_prefix = 'exp1',
)

#include "lib11/hello-greet.h" #包管理

外部依赖

类型

1、依赖其他Bazel项目

local_repository：本地
git_repository：git仓库
http_archive：网络下载

2、依赖非Bazel项目

new_local_repository：本地
new_git_repository：git仓库
new_http_archive：网络下载

3、依赖外部包

Maven存储库

缓存外部依赖

所有的外部依赖会下载到一个软链接目录，通过命令行获得

ls $(bazel info output_base)/external

bazel clean不会删除外部依赖，若删除

bazel clean --expunge

http_archive(
    name = "eigen3",
    build_file = "//repo:eigen.BUILD", #构建此存储库的BUILD。
    sha256 = EIGEN_SHA256,
    strip_prefix = "eigen-12e8d57108c50d8a63605c6eb0144c838c128337",
    urls = [
        'https://gitlab.com/libeigen/eigen/-/archive/3.4.0/eigen-3.4.0.tar.gz',
    ],
)

toolchails

这个文档整理了toolchain的概念和cpp的toolchains示例

toolchain

toolchain是将rule逻辑和基于platform的tools分离的一种方式.

假设你正在编写rule来支持你自己的bar编程语言,你要定义rule:bar_binary,bar_binary使用barc 编译器编译*.bar文件.如果没有toolchain,你应该怎么做呢?

初步:

bar_binary的用户不应该需要指定compile,所以你可以将它在rule定义时,定义为隐含依赖

BarcInfo = provider(
    doc = "Information about how to invoke the barc compiler.",
    # In the real world, compiler_path and system_lib might hold File objects,
    # but for simplicity they are strings for this example. arch_flags is a list
    # of strings.
    fields = ["compiler_path", "system_lib", "arch_flags"],
)

def _bar_binary_impl(ctx):
    ...
    info = ctx.attr._compiler[BarcInfo]
    command = "%s -l %s %s" % (
        info.compiler_path,
        info.system_lib,
        " ".join(info.arch_flags),
    )
    ...
    
bar_binary = rule(
    implementation = _bar_binary_impl,
    attrs = {
        "srcs": attr.label_list(allow_files = True),
        ...
        "_compiler": attr.label(
            default = "//bar_tools:barc_linux",  # the compiler running on linux
            providers = [BarcInfo],
        ),
    },
)

//bar_tools:barc_linux作为每一个bar_binary目标的依赖,在所有的bar_binary的target构建前先构建.

上面一段代码中,涉及到bazel的概念.

provider

构造简单value对象的构造器,成为provider实例

这个value有两个目的

作为一个可以被调用生成结构化vlaue

DataInfo = provider()
d = DataInfo(x = 2, y = 3)
print(d.x + d.y) # prints 5

作为在一个target上访问provider实例的key

DataInfo = provider()
def _rule_impl(ctx)
  ... ctx.attr.dep[DataInfo]

这种方式的问题在于compile的label是硬编码到bar_binary,但是不同的target根据构建的target platform和execution platform选择不同的compile,并且rule作者甚至不一定知道所有可用的工具和平台，因此在rule定义中硬编码它们是不可行的。

进一步

一个解决方案是吧_compile的设置交由用户设置,可以对单个的目标进行硬编码,一遍为多个platform构建

bar_binary(
    name = "myprog_on_linux",
    srcs = ["mysrc.bar"],
    compiler = "//bar_tools:barc_linux",
)

bar_binary(
    name = "myprog_on_windows",
    srcs = ["mysrc.bar"],
    compiler = "//bar_tools:barc_windows",
)

还可以使用基于platform的select选择compile来改进次方案

config_setting(
    name = "on_linux",
    constraint_values = [
        "@platforms//os:linux",
    ],
)

config_setting(
    name = "on_windows",
    constraint_values = [
        "@platforms//os:windows",
    ],
)

bar_binary(
    name = "myprog",
    srcs = ["mysrc.bar"],
    compiler = select({
        ":on_linux": "//bar_tools:barc_linux",
        ":on_windows": "//bar_tools:barc_windows",
    }),
)

这里面涉及到bazel的概念

config_setting, General rules其中之一, 匹配一个预期的配置状态(例如bazel的flags或者platform的constrains)去触发一个可配置的属性

例如下面的示例会匹配任意的bazel指定了–compilation_mode=opt或者-c opt调用(包括在.bazelrc和命令行中)

  config_setting(
      name = "simple",
      values = {"compilation_mode": "opt"}
  )

以下匹配为 ARM 构建并应用自定义定义的任何 Bazel 调用FOO=bar（例如，bazel build --cpu=arm --define FOO=bar ... ）：

  config_setting( 
      name = "two_conditions", 
      values = { 
          "cpu": "arm", 
          "define": "FOO=bar" 
      } 
  )

以下匹配为具有 x86_64 架构和 glibc 版本 2.25 的平台构建的任何 Bazel 调用，假设存在constraint_value with label //example:glibc_2_25。请注意，如果平台定义了这两个之外的其他约束值，则它仍然匹配。

  config_setting( 
      name = "64bit_glibc_2_25", 
      constraint_values = [ 
          "@platforms//cpu:x86_64", 
          "//example:glibc_2_25", 
      ] 
  )

select是一个辅助Function,使rule的属性能够可配置.

例如，您可以使用它来定义特定于平台的依赖关系或嵌入不同的资源，具体取决于rule是在“开发者”还是“发布”模式下构建的。

基本使用如下：

sh_binary( 
    name = "mytarget", 
    srcs = select({ 
        ":conditionA": ["mytarget_a.sh"], 
        ":conditionB": ["mytarget_b.sh"], 
        "//conditions:default": ["mytarget_default .sh"] 
    }) 
)

Configurable attribute

大多数属性都是“可配置的”，这意味着当目标以不同的方式构建时，它们的值可能会发生变化。具体来说，可配置属性可能会根据传递给 Bazel 命令行的标志或请求目标的下游依赖项而有所不同。例如，多个平台或编译模式定制目标。

以下示例为不同的目标架构声明了不同的源。运行bazel build :multiplatform_lib --cpu x86 将使用构建目标x86_impl.cc，而替换 --cpu arm将导致它使用arm_impl.cc。

cc_library(
    name = "multiplatform_lib",
    srcs = select({
        ":x86_mode": ["x86_impl.cc"],
        ":arm_mode": ["arm_impl.cc"]
    })
)
config_setting(
    name = "x86_mode",
    values = { "cpu": "x86" }
)
config_setting(
    name = "arm_mode",
    values = { "cpu": "arm" }
)

Bazel 在处理宏之后和处理规则之前（在加载和分析阶段之间）评估可配置属性。评估之前的任何处理select()都不知道select()选择哪个分支。

编写使用toolchain的rule

在toolchain的框架下,rule不直接依赖于工具,而是依赖于 toolchain_type.

Toolchain_type

Toolchain_type是一个简单的target,表示不同为不同的平台提供相同角色的一类工具,例如你可以声明一个表示bar编译器的类型

toolchain_type(name = "toolchain_type")

上节中rule定义需要修改,compile不再作为属性接受,而是使用//bar_tools:toolchain_type工具链

bar_binary = rule(
    implementation = _bar_binary_impl,
    attrs = {
        "srcs": attr.label_list(allow_files = True),
        ...
        # No `_compiler` attribute anymore.
    },
    toolchains = ["//bar_tools:toolchain_type"]
)

实现函数现在使用工具链类型作为键访问此依赖项ctx.toolchains 而不是ctx.attr

def _bar_binary_impl(ctx):
    ...
    info = ctx.toolchains["//bar_tools:toolchain_type"].barcinfo
    # The rest is unchanged.
    command = "%s -l %s %s" % (
        info.compiler_path,
        info.system_lib,
        " ".join(info.arch_flags),
    )
    ...

定义toolchiains

要为给定的toolchain_type定义一些工具链,需要做如下

针对特定语言代表特定工具和工具套件的rule,通常rule的名字以_toolchain结尾

note:_toolchain rule不能创建任何构建action,而是从其他的rule收集artifact,转发给使用toolchain的rule,这个rule负责创建所有的构建action
这个toolchain_type的多个target代表不同的工具或者工具套件的版本
为每个关联的通用toolchain规则的目标提供toolchain 框架使用的元数据,这个toolchain target也指与此toolchain相关联的toolchain_type.这表示一个给定的_toolchain rule需要被关联到toolchain_type,并且只有在toolchain使用此_toolchain rule的实例中该 rule才关联toolchain_type

对于我们正在运行的示例，这是一个bar_toolchain规则的定义。我们的示例只有一个编译器，但其他工具（例如链接器）也可以在它下面分组。

def _bar_toolchain_impl(ctx):
    toolchain_info = platform_common.ToolchainInfo(
        barcinfo = BarcInfo(
            compiler_path = ctx.attr.compiler_path,
            system_lib = ctx.attr.system_lib,
            arch_flags = ctx.attr.arch_flags,
        ),
    )
    return [toolchain_info]

bar_toolchain = rule(
    implementation = _bar_toolchain_impl,
    attrs = {
        "compiler_path": attr.string(),
        "system_lib": attr.string(),
        "arch_flags": attr.string_list(),
    },
)

bar_toolchain必须返回一个providerToolchainInfo该provider成为使用rule检索的对象ctx.toolchains和toolchain_type的标签。

ToolchainInfo 和struct一样可以保存任意的key/value对

现在您可以为特定barc编译器定义目标。

bar_toolchain(
    name = "barc_linux",
    arch_flags = [
        "--arch=Linux",
        "--debug_everything",
    ],
    compiler_path = "/path/to/barc/on/linux",
    system_lib = "/usr/lib/libbarc.so",
)

bar_toolchain(
    name = "barc_windows",
    arch_flags = [
        "--arch=Windows",
        # Different flags, no debug support on windows.
    ],
    compiler_path = "C:\\path\\on\\windows\\barc.exe",
    system_lib = "C:\\path\\on\\windows\\barclib.dll",
)

最后，为这两个目标创建toolchain定义。bar_toolchain这些定义将特定语言的目标链接到工具链类型，并提供约束信息，告诉 Bazel 工具链何时适用于给定平台。

toolchain(
    name = "barc_linux_toolchain",
    exec_compatible_with = [
        "@platforms//os:linux",
        "@platforms//cpu:x86_64",
    ],
    target_compatible_with = [
        "@platforms//os:linux",
        "@platforms//cpu:x86_64",
    ],
    toolchain = ":barc_linux",
    toolchain_type = ":toolchain_type",
)

toolchain(
    name = "barc_windows_toolchain",
    exec_compatible_with = [
        "@platforms//os:windows",
        "@platforms//cpu:x86_64",
    ],
    target_compatible_with = [
        "@platforms//os:windows",
        "@platforms//cpu:x86_64",
    ],
    toolchain = ":barc_windows",
    toolchain_type = ":toolchain_type",
)

使用工具链注册和构建

此时所有构建块都已组装完毕，您只需使工具链可用于 Bazel 的解析过程。这是通过注册工具链来完成的，或者在WORKSPACE文件中使用 register_toolchains()，或者通过使用标志在命令行上传递工具链的标签--extra_toolchains。

register_toolchains(
    "//bar_tools:barc_linux_toolchain",
    "//bar_tools:barc_windows_toolchain",
    # Target patterns are also permitted, so you could have also written:
    # "//bar_tools:all",
)

现在，当您构建依赖于工具链类型的目标时，将根据目标和执行platform选择合适的工具链。

# my_pkg/BUILD

platform(
    name = "my_target_platform",
    constraint_values = [
        "@platforms//os:linux",
    ],
)

bar_binary(
    name = "my_bar_binary",
    ...
)
bazel build //my_pkg:my_bar_binary --platforms=//my_pkg:my_target_platform

Bazel 将看到//my_pkg:my_bar_binary正在使用一个platform构建，该平台具有@platforms//os:linux并因此解析到 //bar_tools:toolchain_type是//bar_tools:barc_linux_toolchain. 这将最终建立//bar_tools:barc_linux但不是 //bar_tools:barc_windows。

Toolchain 解析

对于每个使用toolchain的target,bazel的toolchain 解析确定了目标的确切toolchain依赖,这个过程的输入为toolchain_type, 目标platform,可用的执行platform列表和可用的toolchain列表,输出是每个toolchain_type的选定的toolchain,以及当前target的选定的执行platform

可用的执行平台和工具链通过register_execution_platforms和register_toolchains从WORKSPACE文件中收集,额外的执行平台和工具链也可以在命令行中通过--extra_execution_platforms 和--extra_toolchains 指定。host platform会被自动的包括作为可用的执行platform,可用平台和工具链作为确定性的有序列表进行跟踪，优先考虑列表中较早的条目

解析步骤如下。

target_compatible_with或exec_compatible_with子句*匹配*一个平台当且仅当对于constraint_value其列表中的每一个，该平台也具有该平台constraint_value`（显式或默认）。

如果平台有constraint_values from constraint_settings 没有被子句引用，这些不会影响匹配。
如果正在构建的目标指定了 exec_compatible_with属性（或其规则定义指定了 exec_compatible_with参数），则过滤可用执行平台列表以删除任何与执行约束不匹配的平台。
对于每个可用的执行平台，您将每个工具链类型与与该执行平台和目标平台兼容的第一个可用工具链（如果有）相关联。
任何未能为其工具链类型之一找到兼容工具链的执行平台都被排除在外。其余平台中，第一个成为当前目标的执行平台，其关联的工具链成为目标的依赖项。

选择的执行平台用于运行目标生成的所有操作。

如果可以在同一构建中以多个配置（例如针对不同的 CPU）构建相同的目标，则解析过程将独立应用于目标的每个版本。

C++ toolchain 设置

为了使用正确的选项调用编译器，Bazel 需要一些关于编译器内部知识，例如包含目录和重要标志。换句话说，Bazel 需要一个简化的编译器模型来理解它的工作原理。

bazel需要知道

compile是否支持 thinLTO, modules, dynamic linking, or PIC (position independent code)
到必要工具的路径,例如 gcc, ld, ar, objcopy等等
系统内置的include 目录,Bazel 需要这些来验证源文件中包含的所有头文件是否在BUILD文件中正确声明
默认的sysroot
哪些标志用于编译、链接、归档。
哪些标志用于支持的编译模式（opt、dbg、fastbuild）。
compile需要的指定的Make 变量

如果编译器支持多种架构，Bazel 需要单独配置