GCC vs. Clang/LLVM_gcc vs cling-优快云博客

本文链接：https://blog.youkuaiyun.com/mapleay/article/details/140404937

GCC vs. Clang/LLVM: An In-Depth Comparison of C/C++ Compilers

Background

Visual C++, GNU Compiler Collection (GCC), and Clang/Low Level Virtual Machine (LLVM) are three mainstream C/C++ compilers in the industry. Visual C++ provides graphical user interfaces (GUIs) and is easy to debug, but it is not suitable for Linux platforms. Therefore, this document mainly compares GCC with Clang/LLVM.

GCC is a program language compiler developed by GNU. It is a set of free software released under the GNU General Public License (GPL) and GNU Lesser General Public License (LGPL). It is an official compiler for the GNU and Linux systems, and a main compiler for compiling and creating other UNIX operating systems.

LLVM contains a series of modularized compiler components and tool chains. It can optimize program languages and links during compilation, runtime, and idle time and generate code. LLVM can serve as a background for compilers in multiple languages. Clang is a C, C++, Objective-C, or Objective-C++ compiler that is compiled in C++ based on LLVM and released under the Apache 2.0 license. Clang is mainly used to provide performance superior to that of GCC.

Through long-term development and iteration, GCC, Clang, and LLVM have become mature compilers in the industry. Then, which compiler is better? Which one should we use to compile and build programs and systems?

Significance of a Good Compiler

Modern processors all have superscalar and long pipelines, and complex internal structures, and they support vector extension units in the Complex Instruction Set Computer (CISC) or Reduced Instruction Set Computer (RISC) architecture. For many programs that contain general computing-intensive kernels, programmers can use vector extension commands to greatly improve program execution performance. For example, in matrix and vector operations, the combined multiplication and addition commands are used to improve performance and accuracy. Bit mask commands are used for branch processing in vector operations. However, to achieve the highest performance, programmers and compilers still need to expend a lot of effort to handle tasks with complex memory access modes and non-standard kernels.

Additionally, standards of modern advanced languages constantly abstract the details of underlying hardware and data structures to generate general code that is more logical and mathematical, instead of specific operation instructions and memory access paths. C++ standards are increasingly expressive and abstract. Python is popular because it is more readable and expressive, even at the cost of a lower running speed. Higher expressiveness increases the burden of the compiler to generate good assembly code from the complex structures compiled by programmers. The compiler must be smarter and work harder to maximize performance by using the code. Not all compilers can do this. When selecting a compiler, we must first consider whether the same code segment can generate more efficient assembly commands.

In addition to generating high-performance executable programs, modern compilers must also have high performance themselves. A large-sized software project in C++ may contain hundreds to thousands of individual translation units. Each translation unit may contain thousands of lines of code. C++ code can also use a large number of template-based programming technologies. These technologies require the compiler to transfer relevant information multiple times to generate a target file. The compilation of large-sized C++ projects may take several hours, and multiple mutually dependent changes must be submitted concurrently during development. Each submission requires developers to recompile most of the code libraries. Therefore, faster compilers (build tools) are critical for achieving high productivity for large teams.

In terms of language extension, modern computing systems with multiple kernels, vector processing capabilities, and accelerators provide capabilities superior to the natural capabilities of common programming languages. Therefore, specific high-performance computing (HPC) frameworks, such as OpenMP and OpenACC, can fill this gap. These frameworks provide application program interfaces (APIs) that programmers can use to express parallelism in code. The compiler and the corresponding runtime library must map the parallel code to the processor architecture. Many HPC projects depend on OpenMP and OpenACC standards, which are being extended by developers and hardware manufacturers. Therefore, the compilers must keep up with the development of language extension standards.

In conclusion, a good compiler allows us to focus on the process of programming, rather than fighting against its shortcomings. It can support the latest language standards, generate optimized commands from the most abstract code, and compile the source code in less time.

GCC Development History

Before learning GCC, you need to first understand the GNU Project. Richard Stallman launched the GNU Project in 1984 to build a UNIX-like open source software system. The GNU operating system has not evolved extensively over time. However, it has incubated many excellent and useful open source software tools, such as Make, Sed, Emacs, Glibc, GDB, and GCC as well. These GNU open source software and Linux kernels together constitute the GNU/Linux system. In the beginning, GCC provided stable and reliable compilers, based on the C programming language, for the GNU system. Its full name is GNU C Compiler. Later, more languages (such as Fortran, Obj-C, and Ada) were supported, and the full name of GCC changed to GNU Compiler Collection.

GCC-1.0 was released by Richard Stallman in 1987, more than thirty years ago. From the software perspective it is very old. Someone collected the GCC development records from between 1989 and 2012, and produced a thirty-minute animated video (GNU Compiler Collection dev history 1989–2012), intuitively demonstrating the development process of GCC. We can learn about the development history of GCC from its versions:

GCC-1.0: released by Richard Stallman in 1987.
GCC-2.0: released in 1992 and supported C++. Later, the GCC community was split because Richard Stallman defined GCC as a reliable C compiler of the GNU system and thought that GCC at that time was sufficient for the GNU system and the development focus should be shifted from GCC to the GNU system itself. Other major developers hoped to continue improving GCC and make more radical developments and improvements in various aspects. These active developers left the GCC community in 1997 and developed the EGCS fork.
GCC-3.0: Obviously, developers generally had a strong desire for good compilers. The EGCS fork developed smoothly and became recognized by more and more developers. Eventually, EGCS was used as the new GCC backbone and GCC-3.0 was released in 2001. The split community was re-merged again, but Richard Stallman’s influence had been weakened to a certain extent. Additionally, the GCC Industrial Committee had begun to decide the development direction of GCC.
GCC-4.0: released in 2005. This version was integrated into Tree Serial Storage Architecture (SSA), and GCC evolved to be a modern compiler.
GCC-5.0: released in 2015. Later, the GCC version policy was adjusted and a major version was released each year. An unexpected benefit is that the version number corresponds with the year. For example, GCC-7 was released in 2017, and GCC-9 was released in 2019.

Now, GCC development has entered the “modern chronicle”. Facing the competitive pressure of LLVM, the GCC community has actively made many adjustments, such as accelerating compilation and improving the compilation warning information. Over the past 30 years, GCC has evolved from a challenger in the compiler industry to a mainstream compiler for Linux systems, and now faces the challenge of LLVM. Fortunately, the GCC community is making adjustments to accelerate the development of GCC. We can expect that the competition between the two compilation technologies will continue to provide software developers with better compilers.

Development History of Clang and LLVM

LLVM

LLVM was originated from the research by Chris Lattner on UUIC in 2000. Chris Lattner wanted to create a dynamic compilation technology for all static and dynamic languages. LLVM is a type of open source software developed under the BSD License. The initial version 1.0 was released in 2003. In 2005, Apple Inc. hired Chris Lattner and his team to develop programming languages and compilers for Apple computers, after which the development of LLVM entered the fast lane. Starting from LLVM 2.5, two minor LLVM versions were released every year (generally in March and September). In November 2011, LLVM 3.0 was released to become the default XCode compiler. XCode 5 started to use Clang and LLVM 5.0 by default. The version policy was adjusted for LLVM 5.0 and later versions, and two major versions are released every year. The current stable version is 8.0.

The name of LLVM was first abbreviated from Low Level Virtual Machine. As this project is not limited to the creation of a virtual machine, the abbreviation LLVM is often questioned. After LLVM was developed, it became a collective term for many compilation tools and low-level tool technologies, making the name less appropriate. Developers decided to abandon the meaning behind this abbreviation. Now LLVM has become the official brand name, applicable to all projects under LLVM, including LLVM Intermediate Representation (LLVM IR), LLVM debugging tools, and LLVM C++ standard libraries. LLVM can be used as a traditional compiler, JIT compiler, assembler, debugger, static analysis tool, and for other functions related to programming languages.

In 2012, LLVM won the software system award of Association for Computing Machinery (ACM), along with traditional systems such as UNIX, WWW, TCP/IP, TeX, and Java. LLVM greatly simplifies the implementation of new programming language tool chains. In recent years, many new programming languages such as Swift, Rust, and Julia have used LLVM as their compilation framework. Additionally, LLVM has become the default compiler for Mac OS X, iOS, FreeBSD and Android systems.

Clang

Clang is designed to provide a frontend compiler that can replace GCC. Apple Inc. (including NeXT later) has been using GCC as the official compiler. GCC has always performed well as a standard compiler in the open source community. However, Apple Inc. has its own requirements for compilation tools. On the one hand, Apple Inc. added many new features for the Objective-C language (or even, later, the C language). However, GCC developers did not accept these features and assigned low priority to support for these features. Later, they were simply divided into two branches for separate development, and consequently the GCC version released by Apple Inc. is far earlier than the official version. On the other hand, the GCC code is highly coupled and hard to be developed separately. Additionally, in later versions, the code quality continues to decrease. However, many functions required by Apple Inc. (such as improved Integrated Development Environment (IDE) support) must call GCC as a module, but GCC never provides such support. Moreover, the GCC Runtime Library Exemption fundamentally limits the development of LLVM GCC. Also limited by the license, Apple Inc. cannot use LLVM to further improve the code generation quality based on GCC. Therefore, Apple Inc. decided to write the frontend Clang of C, C++, and Objective-C languages from scratch to completely replace GCC.

As the name suggests, Clang only supports C, C++, and Objective-C. Development started in 2007 and the C compiler was first completed. Clang for Objective-C cloud be fully used for the production environment in 2009. Support for C++ also progressed quickly. Clang 3.3 fully supported C++ 11, Clang 3.4 fully supported C++ 14, and Clang 5 fully supported C++ 17, and all were significantly ahead of GCC at that time.

Clang/LLVM and GCC Community

GCC Community

Like other open source software communities, the GCC community is dominated by free software enthusiasts and hackers. In the process of development, the GCC community management and participation mechanisms are gradually formed today. Currently, the GCC community is a relatively stable and well-defined acquaintance society in which each person has clear roles and duties:

Richard Stallman and Free Software Foundation (FSF): Although seldom involved in GCC community management, Richard Stallman and FSF are still detached in license and legal affairs.
GCC Industrial Committee: It manages the GCC community affairs, technology-independent GCC development topics, and the appointment and announcement of reviewers and maintainers. It currently has 13 members.
Global maintainers: They dominate GCC development activities. To some extent, they determine the development trend of GCC. Currently, there are 13 global maintainers, who do not all hold office in GCC Industrial Committee.
Frontend, middle-end, and backend maintainers: They are the maintainers of frontend, backend, and other modules. They are responsible for the code of the corresponding GCC module, and many of them are the main contributors to the module code. It is worth noting that reviewers are generally classified into this group. The difference is that reviewers cannot approve their own patch, while maintainers can submit their own modifications within their scope of responsibility without approval from reviewers.
Contributors: They are the most extensive developer groups in the GCC community. After signing the copyright agreement, any developers can apply for the Write after Approval permission from the community, and then submit the code by themselves.

Like other open source communities, the mature GCC community is no longer dominated by hackers. Commercial companies began to play important roles in the community, such as recruiting developers and sponsoring development meetings. Currently, the GCC community is dominated by the following types of commercial companies:

System vendors, mainly including RedHat and SUSE.
Chip vendors, mainly including Intel, ARM, AMD, and IBM (PowerPC).
Specialized vendors, such as CodeSourcery and tool chain service providers like AdaCore based on the Ada language. CodeSourcery had a brilliant history and recruited many famous developers, but declined after it was acquired by Mentor.

In the current GCC community, chip vendors dominate backend development, while system vendors guide other development areas. In terms of community development, the GCC code is currently hosted on its own SVN server. A Git API is provided to facilitate development and submission. Patch review is similar to that in the Linux Kernel community and uses the Mailing List form. As mentioned above, the GCC community is a relatively stable (or closed) acquaintance society. The community basically has 150 to 200 active contributors each year and holds a developer conference in September every year. In September 2019, the developer conference will be held in Montreal, Canada.

LLVM Community

The LLVM community is a noob-friendly compiler community. It quickly responds to the questions of new users and patch reviews. This is also the basis and source for subsequent LLVM Foundation discussions and the adoption of the LLVM Community Code of Conduct, and causes a series of politically correct discussions.

All LLVM projects and problems are discussed through the DevExpress email list, and code submission is notified through the commits email list. All bugs and feature modifications are tracked through the bugs list. The submitted patches are recommended for the master branches. The style complies with LLVM Coding Standards and code review is performed through Phabricator. Currently, the LLVM code repository has been migrated to GitHub.

Unlike the GCC community, the LLVM community only has the LLVM Foundation. The LLVM Foundation has eight members. In addition to managing LLVM community affairs, each member of the LLVM Foundation has to guide LLVM development issues related to technology. Currently, the president is Tanya Lattner, the wife of Chris Lattner. Chris Lattner himself is also a member of the foundation and has strong control over the LLVM community and the development direction of LLVM.

The code review policy in the LLVM community is basically the same as that in the GCC community. The difference is that, due to the fast development of LLVM, many contributors do not have commit access permission, and have to submit their code through the maintainers. Currently, the Clang and LLVM communities have more than 1,000 contributors each year. Generally, developer conferences are held in April and October yearly. The developer conference in October 2019, will be held in San Jose, USA.

The LLVM license is changed from UIUC License to Apache 2.0 License with LLVM exceptions. It is mainly used to solve the problem that the LLVM runtime library is based on MIT License and the patent authorization required for the project is too extensive. Under this license, LLVM allows anyone to derive commercial products from LLVM without any restrictions, and does not require that any derivatives provide open source code, thus promoting the extensive use of LLVM, including:

The downloading or use of LLVM in whole or in part for personal, internal, or commercial purposes. The ability to modify LLVM code without contributing it back to the project.
The creation of a package or release version containing LLVM. The association of LLVM with code authorized by all other major open source licenses (including BSD, MIT, GPLv2, and GPLv3).
When distributing LLVM again, you must retain the copyright notice. You cannot delete or replace the copyright header. The binary file containing LLVM must contain the copyright notice.

Performance Comparison between GCC and LLVM

Concluding Remarks

From the benchmarking tests above, we can see that Clang offers more advantages for the construction of large projects while GCC is always advantageous in performance optimization. The bla depends on your specific application

In addition to performance comparison, I would like to share the advantages and disadvantages of GCC and Clang and LLVM:

Advantages of GCC

GCC supports more traditional languages than Clang and LLVM, such as Ada, Fortran, and Go.
GCC supports more less-popular architectures, and supported RISC-V earlier than Clang and LLVM.
GCC supports more language extensions and more assembly language features than Clang and LLVM. GCC is still the only option for compiling the Linux kernel. Although research on kernel compilation by using Clang and LLVM is also reported in the industry, the kernel cannot be compiled without modifying the source code and compilation parameters.

Advantages of Clang and LLVM

Emerging languages are using the LLVM frameworks, such as Swift, Rust, Julia, and Ruby.
Clang and LLVM comply with the C and C ++ standards more strictly than GCC. GNU Inline and other problems during GCC upgrade do not occur.
Clang also supports some extensions, such as attributes for thread security check.
Clang provides additional useful tools, such as scan-build and clang static analyzer for static analysis, clang-format and clang-tidy for syntax analysis, as well as the editor plug-in Clangd.
Clang provides more accurate and friendly diagnostic information, and highlights error messages, error lines, error line prompts, and repair suggestions. Clang regards the diagnostic information as a feature. The diagnostic information began to be improved only from GCC 5.0, and became mature in GCC 8.

这里来自另一篇博文 https://www.incredibuild.com/blog/gcc-vs-clang-battle-of-the-behemoths

GCC Vs Clang

Let us now get into the GCC Vs Clang head-to-head comparisons between these two behemoths.

FOSS software – It is true that both GCC and Clang are free and open-source software. But their licenses differ quite a bit. GCC is licensed under GPL (GNU public license) whereas Clang/LLVM is licensed under Apache License. GCC Vs Clang comparison under the head of the license should be best left to the lawyers.
Supported platforms – Both GCC and Clang support almost all the modern platforms. Clang/LLVM compiles natively on Windows whereas GCC needs a subsystem like MinGW to work under Windows. Clang Vs GCC comparison here wouldn’t be fair as GCC has no plans to natively support Windows.
Code complexity – GCC is a very complicated software with more than 15 million lines of code. Although it has well-defined frontend/backend stages, the software in itself is more monolithic in nature. Let us now see how Clang Vs GCC would look. Clang is more of modular architecture with well-defined extension points.
Standardization support – GCC has experimental support for C++20, the next version of the C++ standard. It has the full support of C++17 as well as C17, the latest C standard. Clang has the full support of the C++17 standard and is fast catching up with the C++20 standard.
Efficient Code generation – Space and time complexity of generated code in both Clang and GCC are comparable. GCC Vs Clang comparison doesn’t make sense under this head as both optimizers are based on a tight Single Static Assignment form.
Language Independent Type System – Clang Vs GCC makes perfect sense under this heading. Since Clang/LLVM uses a language-independent type system for all supported languages, the exact semantics of instruction can be determined. GCC does not have the design goal of a language-independent Type system.
Parser frontend – GCC used to have Bison based LR parser. It later moved to a handwritten recursive descent parser. Clang always used a deterministic recursive descent hand-written parser with backtracking.
Linker backend – GCC Vs Clang is the most prominent here. GCC uses ld as the linker with support for ld-gold. Clang uses lld as its linker. With some benchmarks, it is seen that lld is faster than ld and even the newer ld-gold.
Build tool – Perhaps another place where Clang Vs GCC difference comes to the forefront. GCC uses autotools and Make as its build tools whereas Clang/LLVM uses CMake as its build tool.
Debugger Support – GCC has an excellent GDB debugger. It is time-tested and works well. Clang has the LLDB debugger build as a set of reusable components over LLVM.

An easier comparison format is tabular as given below:

Criteria	GCC	Clang/LLVM
License	GNU GPL	Apache 2.0
Code modularity	Monolithic	Modular
Supported platforms	*inx, Windows (MinGW)	*inx, Natively in Windows
Supported language standards	C++20 in experimental stage, C++17 fully complaint	C++17 support available. C++20 underway
Generated Code Characteristics	Efficient with a lot of compiler options to play around with	Efficient due to the SSA form used by LLVM backend
Language independent type system	No	Yes (One of the design goal for LLVM)
Build tool	Make based	CMake
Parser	Previously Bison LR. Now recursive descent.	Hand-written recursive descent
Linker	LD	lld
Debugger	GDB	LLDB

Summary

This blog post should help you understand the major differences considering GCC Vs Clang. GCC is a mature compiler with support for many languages. As can be seen from the name Clang supports mostly C, C++, and Objective-C. But the framework underlying Clang called LLVM is extensible enough to support newer languages like Julia and Swift. From the perspective of C++, both are excellent compilers supporting the C++17 standard.

From Incredibuild perspective, both are great tools. That’s why we support both Clang and GCC. Incredibuild seamlessly distributes Clang and GCC compilation processes across hundreds of remote cores in the network, resulting in a drastic acceleration of compilation time for both Clang and GCC.