编译时候,我对于编译工具链到底需要多少材料,很好奇。也就是tarballs里面提前自己下载的源码(可以加快编译速度),还有那些自己没有下载crosstoo-NG自动联网下载的部分。
首先理清楚一下编译的步骤。
以C语言为例。
.c----------->.i-------------------->.s------------------>.o---------------------->可执行文件
预处理 编译 汇编 链接
那肯定需要GCC 和C库,因为程序里会使用库函数。为什么需要linux源码? 下面一一查询各个源代码的功能,有些比较清楚比如GCC和glibc,
其余一些源码不是很清楚,查询一下这些源码的用处。
参考资料http://crosstool-ng.github.io/docs/toolchain-construction/
1:什么是交叉编译工具链
The tools are arranged in a way that they are chained, in a kind of cascade, where the output from one becomes the input to another one, to ultimately produce the actual binary code that runs on a machine
这些工具像链条一样紧密配合工作,一个工具的输出变成下一个工具的输入,最后生成可以在机器上运行的二进制文件。所以叫 toolchain
为什么交叉编译?
When a toolchain is meant to generate code for a machine different from the machine it runs on, this is called a cross-toolchain。
生成不是在当前机器运行的二进制码,所以叫cross
如果目标开发板的功能强大,可以在它上面进行编译工作。我曾经试过在开发板运行arm-linux-gcc ,处理小型程序是可行的,就没有交叉的说法了。
2:工具链的组成部分
1>The components that play a role in the toolchain are first and foremost the compiler itself. The compiler turns source code (in C, C++, whatever) into assembly code. The compiler of choice is the GNU compiler collection, well known asgcc
. 编译器,生成汇编代码
2>The assembly code is interpreted by the assembler to generate object code. This is done by the binary utilities, such as the GNUbinutils 链接,生成object code目标代码
3>Once the different object code files have been generated, they got to get aggregated together to form the final executable binary. This is called linking, and is achieved with the use of a linker. The GNUbinutils also come with a linker. 多个目标代码linker,生成可执行文件。
下面很关键的,解决了我的疑惑。为什么需要linux相关的源码?
So far, we get a complete toolchain that is capable of turning source code into actual executable code.
依赖于,如果开发板移植了操作系统,比如linux那么就需要相关的源码。
C库,比如linux,典型的glibc。对于没有操作系统的,可以使用newlib,dietlibc甚至没有。C库提供了标准的抽象层来完成基本任务,比如分配内存,打印输出到terminal,管理文件访问等。详细如下英文描述,做参考。
Depending on the Operating System, or the lack thereof, running on the target, we also need the C library. The C library provides a standard abstraction layer that performs basic tasks (such as allocating memory, printing output on a terminal, managing file access…). There are many C libraries, each targeted to different systems. For the Linuxdesktop, there is glibc
or eglibc
or even uClibc
, for embedded Linux, you have a choice ofeglibc
or uClibc
, while for system without an operating system, you may use newlib
, dietlibc
, or even none at all. There a few other C libraries, but they are not as widely used, and/or are targeted to very specific needs (e.g.,klibc
is a very small subset of the C library aimed at building constrained initial ramdisks).
在Linux下,C库需要知道内核提供的API,API由内核头文件提供,所以上面需要linux-glibc-headers.
Under Linux, the C library needs to know the API to the kernel to decide what features are present, and if needed, what emulation to include for missing features. That API is provided by the kernel headers. Note: this is Linux-specific (and potentially a very few others), the C library on other OSes do not need the kernel headers.
3这些组件是如何链接在一起?
how do all these components chained together
编译我们需要的最终的编译工具(需要知道如何使用C库的终编译器),其实需要c 库,但是C库的生成也需要一个compiler,
So far, all major components have been covered, but yet there is a specific order they need to be built. Here we see what the dependencies are, starting with the compiler we want to ultimately use. We call that compiler the final compiler.
-
the final compiler needs the C library, to know how to use it, but:
-
building the C library requires a compiler
A needs B which needs A. This is the classic chicken’n’egg problem… This is solved by building a stripped-down compiler that does not need the C library, but is capable of building it. We call it a bootstrap, initial, or core compiler. So here is the new dependency list:
-
the final compiler needs the C library, to know how to use it,
-
building the C library requires a core compiler but:
-
the core compiler needs the C library headers and start files, to know how to use the C library
最终作者谈论到C库的一些很原始的GMP库(the GNU Multiple Precision Arithmetic Library),我看的有点晕,有一点是清楚的,越原始的库,功能越简单。简单到最后,接近计算机的本质,就是计算.
下面与原英文无关。
人们发明计算机的初衷就是把人从大量的计算任务下解放出来。无论是早期的模拟计算机,甚至更早的机械齿轮计算机,都是为了计算而发明,所以最开始不存在编译任务。第一,解决数的表示问题。第二数的计算问题。打个比方10进制,用1V表示1,2V表示2,只要有一个能够运算的电路,最后结果,用仪器测量输出的电压,就是简单的计算机,当然有功能齐全的计算机和功能简单的计算机的区别,复杂的也是一步步改进而来的。
所以早期的比较成型的计算机的程序是用开关和电线来进行不同变量的输入。
电线输入的缺点在于,慢,虽然机器很快,但是人操作开关,很慢。时间浪费了。出现了穿孔纸卡。
在机器运算之前准备好,就可以一次性输入,而且可以重复利用。重复利用 ,就是库函数的概念吧。
穿孔纸卡之后,格雷斯·霍珀(Grace Hopper)想出一个点子,第一个编译器的雏形。
Grace 产生了一种想法,她想设计一种程序,让人可以用类似英文的语法,把想做的事写下来,然后用这个程序把英文翻译成机器的语法,交给机器去执行。这个想法就是今日的 Compiler (编译器)。
A-0 的原理是:编译程序把穿孔卡加载到计算机中。然后编写的程序将被送入计算机。计算机会吐出另一组包含机器代码的卡片。第二组卡片将被装入计算机,计算机就可以执行这段新的程序。()
这里涉及两个程序。一个是编译出下一步执行的程序的程序。
原理上的可行,在于编码,首先使用0101等对英文字母进行编码,编码之后,程序根据固定的格式描述,生成对应的二进制编码,并打成“带孔的纸卡”,作为结果的表述。
这个翻译的过程,就是简单的状态变换的概念。理论上可行的。当然具体实现很复杂的电路。
一切都是编码,然后编码从一个状态转换到另外一种状态。
从毕业之后4年了,才算脑子里接受了计算机的编译过程。大学时候,老师讲的很起劲,我听得一头雾水,我就不明白了,写的英文字母一堆堆的程序,怎么就能被计算机识别并计算呢?
要感谢《编码 隐匿在计算机软硬件背后的语言》这本书。