Build in the Cloud: How the Build System works

本文详细介绍了谷歌如何利用云平台进行大规模的软件构建和测试工作,通过将源代码划分为更小的包,使用BUILD文件描述依赖关系,以及采用非文件依赖的规则来提高构建效率。谷歌的构建系统能够智能地组织并执行构建任务,即使在大量并行操作下也能保持高效。

This is the second in a four part series describing how we use the cloud to scale building and testing of software at Google. This series elaborates on a presentation given during the Pre-GTAC 2010 event in Hyderabad. Please see our first post in the series for a description how we access our code repository in a scalable way. To get a sense of the scale and details on the types of problems we are solving in Engineering Tools at Google take a look at this post.


As you have learned in the previous post, at Google every product is constantly building from head. In this article we will go into a little more detail in how we tell the build system about the source code and its dependencies. In a later article we will explain how we take advantage of having this exact knowledge to distribute the builds at Google across a large cluster of machines and share build results between developers.

From some of the feedback we got for our first posts it is clear that you want to see how we actually go about describing the dependencies we use to drive our build and test support.

At Google we partition our source code into smaller entities that we call packages. You can think of a package as a directory containing a number of source files and a description of how these source files should get translated into build outputs. The files describing that are all called BUILD and the existence of a BUILD file in a directory makes that directory into a package.

All packages reside in a single file tree and the relative path from the root of that tree to the directory containing the BUILD file is used as a globally unique identifier for the package. This means that there is a 1:1 relationship between package names and directory names.

Within a BUILD file we have a number of named rules describing the build outputs of the package. The combination of package name and rule name uniquely identifies the rule. We call this combination a label and we use labels to describe dependencies across rules.

Let’s look at an example to make this a little more concrete:

/search/BUILD:
cc_binary(name = ‘google_search_page’,
deps = [ ‘:search’,
‘:show_results’])

cc_library(name = ‘search’,
srcs = [ ‘search.h’,‘search.cc’],
deps = [‘//index:query’])

/index/BUILD:

cc_library(name = ‘query’,
srcs = [ ‘query.h’, ‘query.cc’, ‘query_util.cc’],
deps = [‘:ranking’,
‘:index’])

cc_library(name = ‘ranking’,
srcs = [‘ranking.h’, ‘ranking.cc’],
deps = [‘:index’,
‘//storage/database:query’])

cc_library(name = ‘index’,
srcs = [‘index.h’, ‘index.cc’],
deps = [‘//storage/database:query’])

This example shows two BUILD files. The first one describes a binary and a library in the package //search and the second one describes several libraries in the package //index. All rules are named using the name attribute, You can see the dependencies expressed in the deps attribute of each rule. The ‘:’ separates the package name from the name of the rule. Dependencies on rules in a different package start with a “//” and dependencies on rules in the same package can just omit the package name itself and start with a ‘:’. You can clearly see the dependencies between the rules. If you look carefully you will notice that several rules depend on the same rule //storage/database:query, which means that the dependencies form a directed graph. We require that the graph is acyclic, so that we can create an order in which to build the individual targets.

 

Here is a graphical representation of the dependencies described above:

 

You can also see that instead of referring to concrete build outputs as typically happens in make based build systems, we refer to abstract entities. In fact a rule like cc_library does not necessarily produce any output at all. It is simply a way to logically organize your source files. This has an important advantage that our build system uses to make the build faster. Even though there are dependencies between the different cc_library rules, we have the freedom to compile all of the source files in arbitrary order. All we need to compile ‘query.cc’ and ‘query_util.cc’ are the header files of the dependencies ‘ranking.h’ and ‘index.h’. In practice we will compile all these source files at the same time, which is always possible unless there is a dependency on an actual output file of another rule.


In order to actually perform the build steps necessary we decompose each rule into one or more discrete steps called actions. You can think of an action as a command line and a list of the input and output files. Output files of one action can be the input files to another action. That way all the actions in a build form a bipartite graph consisting of actions and files. All we have to do to build our target is to walk this graph from the leaves - the source files that do not have an action that produces them - to the root and executing all the actions in that order. This will guarantee that we will always have all the input files to an action in place prior to executing that action.

 

An example of the bipartite action graph for a small number of targets. Actions are shown in green and files in yellow color:

 


We also make sure that the command-line arguments of every action only contain file names using relative paths - using the relative path from the root of our directory hierarchy for source files. This and the fact that we know all input and output files for an action allows us to easily execute any action remotely: we just copy all of the input files to the remote machine, execute the command on that machine and copy the output files back to the user’s machine. We will talk about some of the ways we make remote builds more efficient in a later post.

All of this is not much different from what happens in make. The important difference lies in the fact that we don’t specify the dependencies between rules in terms of files, which gives the build system a much larger degree of freedom to choose the order in which to execute the actions. The more parallelism we have in a build (i.e. the wider the action graph) the faster we can deliver the end results to the user. At Google we usually build our solutions in a way that can scale simply by adding more machines. If we have a sufficiently large number of machines in our datacenter, the time it takes to perform a build should be dominated only by the height of the action graph.

What I describe above is how a clean build works at Google. In practice most builds during development will be incremental. This means that a developer will make changes to a small number of source files in between builds. Building the whole action graph would be wasteful in this case, so we will skip executing an action unless one or more of its input files change compared to the previous build. In order to do that we keep track of the content digest of each input file whenever we execute an action. As we mentioned in the previous blog post, we keep track of the content digest of source files and we use the same content digest to track changes to files. The FUSE filesystem described in the previous post exposes the digest of each file as an extended attribute. This makes determining the digest of a file very cheap for the build system and allows us to skip executing actions if none of the input files have changed.

Stay tuned! In this post we explained how the build system works and some of the important concepts that allow us to make the build faster. Part three in this series describes how we made remote execution really efficient and how it can be used to improve build times even more!

- Christian Kemper

【电力系统】单机无穷大电力系统短路故障暂态稳定Simulink仿真(带说明文档)内容概要:本文档围绕“单机无穷大电力系统短路故障暂态稳定Simulink仿真”展开,提供了完整的仿真模型与说明文档,重点研究电力系统在发生短路故障后的暂态稳定性问题。通过Simulink搭建单机无穷大系统模型,模拟不同类型的短路故障(如三相短路),分析系统在故障期间及切除后的动态响应,包括发电机转子角度、转速、电压和功率等关键参数的变化,进而评估系统的暂态稳定能力。该仿真有助于理解电力系统稳定性机理,掌握暂态过程分析方法。; 适合人群:电气工程及相关专业的本科生、研究生,以及从事电力系统分析、运行与控制工作的科研人员和工程师。; 使用场景及目标:①学习电力系统暂态稳定的基本概念与分析方法;②掌握利用Simulink进行电力系统建模与仿真的技能;③研究短路故障对系统稳定性的影响及提高稳定性的措施(如故障清除时间优化);④辅助课程设计、毕业设计或科研项目中的系统仿真验证。; 阅读建议:建议结合电力系统稳定性理论知识进行学习,先理解仿真模型各模块的功能与参数设置,再运行仿真并仔细分析输出结果,尝试改变故障类型或系统参数以观察其对稳定性的影响,从而深化对暂态稳定问题的理解。
本研究聚焦于运用MATLAB平台,将支持向量机(SVM)应用于数据预测任务,并引入粒子群优化(PSO)算法对模型的关键参数进行自动调优。该研究属于机器学习领域的典型实践,其核心在于利用SVM构建分类模型,同时借助PSO的全局搜索能力,高效确定SVM的最优超参数配置,从而显著增强模型的整体预测效能。 支持向量机作为一种经典的监督学习方法,其基本原理是通过在高维特征空间中构造一个具有最大间隔的决策边界,以实现对样本数据的分类或回归分析。该算法擅长处理小规模样本集、非线性关系以及高维度特征识别问题,其有效性源于通过核函数将原始数据映射至更高维的空间,使得原本复杂的分类问题变得线性可分。 粒子群优化算法是一种模拟鸟群社会行为的群体智能优化技术。在该算法框架下,每个潜在解被视作一个“粒子”,粒子群在解空间中协同搜索,通过不断迭代更新自身速度与位置,并参考个体历史最优解和群体全局最优解的信息,逐步逼近问题的最优解。在本应用中,PSO被专门用于搜寻SVM中影响模型性能的两个关键参数——正则化参数C与核函数参数γ的最优组合。 项目所提供的实现代码涵盖了从数据加载、预处理(如标准化处理)、基础SVM模型构建到PSO优化流程的完整步骤。优化过程会针对不同的核函数(例如线性核、多项式核及径向基函数核等)进行参数寻优,并系统评估优化前后模型性能的差异。性能对比通常基于准确率、精确率、召回率及F1分数等多项分类指标展开,从而定量验证PSO算法在提升SVM模型分类能力方面的实际效果。 本研究通过一个具体的MATLAB实现案例,旨在演示如何将全局优化算法与机器学习模型相结合,以解决模型参数选择这一关键问题。通过此实践,研究者不仅能够深入理解SVM的工作原理,还能掌握利用智能优化技术提升模型泛化性能的有效方法,这对于机器学习在实际问题中的应用具有重要的参考价值。 资源来源于网络分享,仅用于学习交流使用,请勿用于商业,如有侵权请联系我删除!
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值