Posted on August 18, 2021 by Niall Cooling
Introduction
One of the headline changes of the C++20 standard is the inclusion of modules. Modules promise to significantly change the structure of C++ codebases and possibly signal headers’ ultimate demise (but probably not in my lifetime). It also opens the door to potentially have a unified build system and package manager, similar to Rust’s Cargo package manager; though I imaging standardising a unified build system would be one bloody battle.
Pre-C++20 builds
If you want to start a heated debate on any C++ forum/channel, just state that one particular build system (e.g. Meson, CMake, Bazal, etc.) is better than the others; or that your way of using that build system is the “one, and only one, correct way”. If you are unfamiliar with build systems, I’d recommend reading this post first to understand the challenges.
Start with the Why
There have already been several articles written about Modules (significantly by Microsoft ). But my experience, in reading these, is that they focus ‘how’ modules work in C++20 and seem to miss the ‘Why’. Maybe the authors consider it obvious, but I think it depends on your background. In addition, all I have read use Microsoft MSVC, due to this having the fullest support for Modules among the mainstream toolchains.
First and foremost, when discussing modules, surely, we should be discussing modularity. We already have one form of modularity in C++ with the object/class model. But this is modularity ‘in the small’; modules are addressing ‘modularity in the large’, i.e., program-wide modularity.
So What problem are we trying to solve by adding modules?
Let’s be honest, “Headers are a mess” – they can (and have for many decades) been used effectively, but so often, I see very poorly constructed headers (IMHO). A well-crafted application will, typically, have pairs of files to “mimic” a module, e.g., file.h and file.cpp. But there is no enforcement of this approach; we also need to understand external- and internal-linkage rules to build a modular architecture safely.
The root of the problem with headers is they only exist up to and including pre-processing:
Headers do not exist during the compilation phase
We could happily (okay, maybe not happily) write a complete C++ application without any headers. There would be a lot of code duplication (declarations), and it would be a maintenance nightmare, but that’s how the current build model works (it all stems from the definition of a translation unit).
Modularity (in the large) is typically closely related to application architecture and construction – the files that make up our build.
Before we go on, I need to stress one important aspect – Modules and Namespaces are entirely orthogonal and co-exist as independent aspects (more on this later).
Other languages
Many modern languages tend to build the semantics of a module around all the code for the module existing within a single file, e.g. Java and Python
They have the concepts of exporting and importing types and behaviours from other modules. It is very similar to the public/private semantics of the class but at the file scope.
Interestingly, older languages, such as Ada and Modula-2, designed in the 1980s, around the same time as the original C++, use a two-file structure for defining modules (or packages in Ada’s case). These designs separate the module interface from the implementation.
The significant benefits of the interface/implementation file structure can be:
- Improved build times
- Simplified integration and testing
Though, of course, this is another hotly debated subject.
C++20 Module File Structure
Dare I say it, but C++ being C++, rather than a straightforward way of structuring models (a.la. Java), we have been given the suiss-army-knife approach to module construction. There are numerous ways of doing the same thing and lots of special cases. This, initially, caused me a lot of problems as my mental model (based on other language paradigms) wasn’t aligning with what I was being introduced to.
There is no one way of correctly using C++20 modules
I’m sure, over time, we will come up with new idioms regarding the use of modules, but for now, I can see three obvious uses of modules (think 80:20 rule)
- Single file module – the Java/Python model or a complete module
- A separate Interface file and Implementation file for a module – the Ada model
- Multiple separate files (partitions) combining to define a single module concept – the C++20 model
In C++20, any file containing the module syntax is referred to as a Module Unit. Therefore a Named Module may be made up of one or more Module Units.
Single-file (Complete) Module
Pre-C++20 code
Let’s start with the obligatory “hello, world!” example, splitting the behaviour across two files.
Remember
At compilation headers don’t exist
We have two files, func.cpp
and main.cpp
// func.cpp
#include <iostream>
void func() { // definition
std::cout << "hello, world!\n";
}
// main.cpp
void func(); // declaration
int main(){
func();
}
We can go ahead and build and run the application:
$ g++ -c func.cpp
$ g++ -c main.cpp
$ g++ -o App main.o func.o
$ ./App
hello, world!
This, of course, builds successfully as the function func
has, by default, external linkage (often referred to as global scope). So as long as main
has a valid declaration, the main.cpp
file can be compiled, and the linker resolves the exported/imported symbols.
C++20 Module
The early proposals for module support, P1103R3, uses the term complete module where
A complete module can be defined in a single source file.
I quite like this term, and therefore I’m going to use it for single-file modules (until something better comes along).
First, we need to create our Named module. A complete module file will, typically, have two, possibly, three sections (called fragments)
- A global module fragment – this is where we include things we need (optional)
- The main module purview – Where we can export types and behaviour
- A private fragment – this ends the portion of the module interface that can affect the behaviour of other translation units (optional).
The private module fragment can only appear in single-file modules. The current version of GCC (gcc version 11.1.0) does not support private fragments, so I’m not going to ignore them for this post.
The C++ standard does not define file extensions; this is toolchain specific. With GCC, the file name suffix determines how the file is treated for any given input file.
GCC interprets the following file extensions as C++ source code which must be preprocessed:
- file.cc
- file.cp
- file.cxx
- file.cpp
- file.c++
- file.C
We have always preferred the .cpp
extension for C++ source files in our projects and when teaching C++. So to that end, in the following examples, I will use .cpp
for regular C++ source files and .cxx
for module files. This is nothing more than a personal preference.
Notably, Microsoft has chosen to use the extension .ixx
for module interfaces (see link). We could use file.ixx
but, with GCC, would need to use the -x c++ file.ixx
directive to specify the file should be treated as a C++ file. Rather than use the additional complication, using .cxx
means it will be treated as a standard C++ file by GCC.
To make the original file func.cpp
into a module (func.cxx
), we add the line
export module MODULE-NAME;
e.g.
// func.cxx
#include <iostream>
export module mod;
void func() {
std::cout << "hello, world!\n";
}
However, this will not yet compile. The include
statement needs to be in the global fragment. The global fragment must precede the main purview
and is simply introduced using the keyword module
, e.g.
// func.cxx
module;
#include <iostream>
export module mod;
void func() {
std::cout << "hello, world!\n";
}
We can now import the module mod
into main:
// main.cpp
import mod;
int main(){
func();
}
Next we can compile func.cxx
$ g++ -c -std=c++20 -fmodules-ts func.cxx
Note, in GCC C++20, modules are, currently, not enabled by just specifying c++20
; you must also supply the directives-fmodules-ts
.
As expected, the compilation generates an object file func.o
. However, you will also notice that a subdirectory, gcm.cache
is created, with the file mod.gcm
. This is the generated module interface file used by the compile.
If we go ahead and compile main.cpp
$ g++ -c -std=c++20 -fmodules-ts main.cpp
main.cpp: In function 'int main()':
main.cpp:5:5: error: 'func' was not declared in this scope
5 | func();
| ^~~~
We get the error that func
was not declared. If we tried to declare it in main.cpp
(as before) it would build but fail to link.
So this gives us our first significant change:
In modules, all declarations and definitions are private unless exported
Officially they have Module Linkage, which differs from internal linkage. This only becomes apparent when using a multi-file module and partitions (cover in the follow-on post).
To fix this, we export
the function, e.g.
// func.cxx
module;
#include <iostream>
export module mod;
export void func() {
std::cout << "hello, world!\n";
}
The project now successfully compiles and links:
$ g++ -c -std=c++20 -fmodules-ts func.cxx
$ g++ -c -std=c++20 -fmodules-ts main.cpp
$ g++ main.o func.o -o App
$ ./App
hello, world!
One final detail, we still can separate out a declaration from a definition, e.g.
// func.cxx
module;
#include <iostream>
export module mod;
export void func();
void func() {
std::cout << "hello, world!\n";
}
I’m not sure this is of much benefit, but I guess comes down to style.
Separate Interface and Implementation files
At some point, or because of personal preference, we may choose to split our module into multiple files to make it more manageable and help rebuild times.
Each file pertaining to a module is called a Module Unit. We are going to create two units:
- A Primary Module Interface Unit (PMIU)
- A Module Implementation Unit
Primary Module Interface Unit
Each named module must have one, and only one, Primary Module Interface Unit. This is the revised file func.cxx
that contains the statement:
// func.cxx
export module MODULE-NAME;
and our other export
statements, e.g.
export module mod;
export void func();
That’s all we need; we have named a module mod
that exports a single function func
. We can go ahead and compile this unit:
$ g++ -c -std=c++20 -fmodules-ts func.cxx
As before, this generates func.o
and gmc.cache\mod.gmc
.
Module Implementation Unit
There currently is no idiomatic naming conversion, so I have gone with func_impl.cxx
, but it could be any filename/extension you prefer. You cannot use func.ixx
as it will also generate an object file func.o
which will overwrite that func.cxx
generated object file.
An implementation unit contains the line:
module MODULE-NAME;
Note it does not have the export
keyword. This implicitly makes anything declared/defined in the PMIU available in the implementation unit (the opposite is not true). Note, implementation units cannot have any export statements.
// func_impl.cxx
module;
#include <iostream>
module mod;
void func() {
std::cout << "hello, world!\n";
}
And there we have it. This implementation unit can now be compiled:
$ g++ -c -std=c++20 -fmodules-ts func_impl.cxx
Which generates func_impl.o
– we have two module object files as part of our linkage, but of course, this also means the changes to the module implementation do not require a recompile of clients (mimicking the behaviour of included headers).
$ g++ main.o func.o func_impl.o -o App
$ ./App
hello, world!
With GCC, the interface unit must be compiled before the implementation unit.
export
So far, we have only exported a single function. Assuming we have multiple functions, we want to export, the standard allows for several options.
Export per function
For each function we want to export, we simply prepend the export
keyword to add it to the interface.
// func.cxx
export module mod;
export void func();
export void func(int);
Export block
Alternatively, we can group many declarations into an export block
, e.g.
// func.cxx
export module mod;
export {
void func();
void func(int);
}
Namespace
As mentioned earlier, C++ namespaces are orthogonal to modules. Again, I will admit this initially also caused me some confusion. Not, in so much, as the syntax, more the general philosophy of using namespaces and modules; where each one fits in an architectural structure. This is possibly skewed by my early background in Ada, where the two (the package) very much align.
From a practical perspective, namespaces behave as before, so there is no real change to their use, e.g.
// func.cxx
export module mod;
namespace X {
export void func();
export void func(int);
}
// func_impl.cxx
module;
#include <iostream>
module mod;
namespace X {
void func() {
std::cout << "hello, world!\n";
}
void func(int p) {
std::cout << "hello, " << p << '\n';
}
}
// main.cpp
import mod;
int main(){
X::func();
X::func(42);
}
Export namespace
Alternatively, if we export a namespace, all declarations within that namespace are automatically included in the module’s interface, e.g.
// func.cxx
export module mod;
export namespace X {
void func();
void func(int);
}
Exporting Types, etc.
Anything we require as part of the module interface must be exported. For example, if function takes an object reference as a parameter, the normal type definition visibility rules apply, e.g.
// func.cxx
export module mod;
export class S {
public:
S() = default;
explicit S(int p):val{p}{}
int get_val() const;
private:
int val{};
};
export void func(const S&);
Or
// func.cxx
export module mod;
export {
class S {
public:
S() = default;
explicit S(int p):val{p}{}
int get_val() const;
private:
int val{};
};
void func(const S&);
}
// func_impl.cxx
module;
#include <iostream>
module mod; // implicitly import everything in PMIU
void func(const S& ptr) {
std::cout << "hello, " << ptr.get_val() << '\n';
}
int S::get_val() const {
return val;
}
// main.cpp
import mod;
int main(){
S s{10};
func(s);
}
There are a whole host of rules and exceptions regarding exporting items, such as templates and ADL evaluation. I’m not going to get into those here as it becomes very use-specific.
Includes
In the example, we can see we’ve used the traditional preprocessor directive #include
to include the standard library header iostream
. The standard permits the following:
import <iostream>;
import "header.h";
This is supported by GCC11, but there are some hoops you have to jump through to first make a user-defined header importable (see the -fmodule-header
directive).
If you have a header-only file, e.g.
// header.h
#ifndef _HEADER_
#define _HEADER_
constexpr int life = 42;
#endif
and you want to import; you first need to compile it, e.g.
$ g++ -c -std=c++20 -fmodule-header header.h
This generates a header.h.gcm
file. The header can now be imported using the directive
import "header.h";
note the ;
In addition, Microsoft has already wrapped the standard library up in a module structure, so you may see the following:
import std.core
in Microsoft specific examples.
Summary
Hopefully, this will give you a feel for the foundations of C++20 modules and enough to go and experiment. I believe that the single-file and interface/implementation models will accommodate most people initial uses of modules.
In the follow-up post, I will cover the more complex ability to split a module’s interface into multiple files (called partitions). Using partitions, on the surface looks straightforward, but appears to open a pandoras’ box of fun and games.
My initial reaction to modules is that they are overly complex, but I put that down to my background and mental model having used module concepts in other languages. In addition, much of the coverage of modules does delve down into the complications of using partitions without laying out the basics.
I’m sure over the next couple of years, as support for modules improves, we’ll find an idiomatic approach to using modules; even if we can just get a consistent file naming convention.
In the deeply embedded space, we have only recently seen the release of GCC10 for Arm, so I can imagine it may be sometime before GCC11 can be used in our target project. Until then, I will continue to experiment with modules and partitions on the host.
The example code can be found here
Next post: C++20 Module Partitions