C++20 modules with GCC11

Posted on August 18, 2021 by Niall Cooling

Introduction

One of the headline changes of the C++20 standard is the inclusion of modules. Modules promise to significantly change the structure of C++ codebases and possibly signal headers’ ultimate demise (but probably not in my lifetime). It also opens the door to potentially have a unified build system and package manager, similar to Rust’s Cargo package manager; though I imaging standardising a unified build system would be one bloody battle.

Pre-C++20 builds

If you want to start a heated debate on any C++ forum/channel, just state that one particular build system (e.g. Meson, CMake, Bazal, etc.) is better than the others; or that your way of using that build system is the “one, and only one, correct way”. If you are unfamiliar with build systems, I’d recommend reading this post first to understand the challenges.

Start with the Why

There have already been several articles written about Modules (significantly by Microsoft ). But my experience, in reading these, is that they focus ‘how’ modules work in C++20 and seem to miss the ‘Why’. Maybe the authors consider it obvious, but I think it depends on your background. In addition, all I have read use Microsoft MSVC, due to this having the fullest support for Modules among the mainstream toolchains.

First and foremost, when discussing modules, surely, we should be discussing modularity. We already have one form of modularity in C++ with the object/class model. But this is modularity ‘in the small’; modules are addressing ‘modularity in the large’, i.e., program-wide modularity.

So What problem are we trying to solve by adding modules?

Let’s be honest, “Headers are a mess” – they can (and have for many decades) been used effectively, but so often, I see very poorly constructed headers (IMHO). A well-crafted application will, typically, have pairs of files to “mimic” a module, e.g., file.h and file.cpp. But there is no enforcement of this approach; we also need to understand external- and internal-linkage rules to build a modular architecture safely.

The root of the problem with headers is they only exist up to and including pre-processing:

Headers do not exist during the compilation phase

We could happily (okay, maybe not happily) write a complete C++ application without any headers. There would be a lot of code duplication (declarations), and it would be a maintenance nightmare, but that’s how the current build model works (it all stems from the definition of a translation unit).

Modularity (in the large) is typically closely related to application architecture and construction – the files that make up our build.

Before we go on, I need to stress one important aspect – Modules and Namespaces are entirely orthogonal and co-exist as independent aspects (more on this later).

Other languages

Many modern languages tend to build the semantics of a module around all the code for the module existing within a single file, e.g. Java and Python

They have the concepts of exporting and importing types and behaviours from other modules. It is very similar to the public/private semantics of the class but at the file scope.

Interestingly, older languages, such as Ada and Modula-2, designed in the 1980s, around the same time as the original C++, use a two-file structure for defining modules (or packages in Ada’s case). These designs separate the module interface from the implementation.

The significant benefits of the interface/implementation file structure can be:

  • Improved build times
  • Simplified integration and testing

Though, of course, this is another hotly debated subject.

C++20 Module File Structure

Dare I say it, but C++ being C++, rather than a straightforward way of structuring models (a.la. Java), we have been given the suiss-army-knife approach to module construction. There are numerous ways of doing the same thing and lots of special cases. This, initially, caused me a lot of problems as my mental model (based on other language paradigms) wasn’t aligning with what I was being introduced to.

There is no one way of correctly using C++20 modules

I’m sure, over time, we will come up with new idioms regarding the use of modules, but for now, I can see three obvious uses of modules (think 80:20 rule)

  1. Single file module – the Java/Python model or a complete module
  2. A separate Interface file and Implementation file for a module – the Ada model
  3. Multiple separate files (partitions) combining to define a single module concept – the C++20 model

In C++20, any file containing the module syntax is referred to as a Module Unit. Therefore a Named Module may be made up of one or more Module Units.

Single-file (Complete) Module

Pre-C++20 code

Let’s start with the obligatory “hello, world!” example, splitting the behaviour across two files.

Remember

At compilation headers don’t exist

We have two files, func.cpp and main.cpp

// func.cpp
#include <iostream>

void func() {  // definition
    std::cout << "hello, world!\n";
}
// main.cpp
void func();  // declaration

int main(){
    func();
}

We can go ahead and build and run the application:

$ g++ -c func.cpp 
$ g++ -c main.cpp 
$ g++ -o App main.o func.o
$ ./App 
hello, world!

This, of course, builds successfully as the function func has, by default, external linkage (often referred to as global scope). So as long as main has a valid declaration, the main.cpp file can be compiled, and the linker resolves the exported/imported symbols.

C++20 Module

The early proposals for module support, P1103R3, uses the term complete module where

A complete module can be defined in a single source file.

I quite like this term, and therefore I’m going to use it for single-file modules (until something better comes along).

First, we need to create our Named module. A complete module file will, typically, have two, possibly, three sections (called fragments)

  • A global module fragment – this is where we include things we need (optional)
  • The main module purview – Where we can export types and behaviour
  • A private fragment – this ends the portion of the module interface that can affect the behaviour of other translation units (optional).

The private module fragment can only appear in single-file modules. The current version of GCC (gcc version 11.1.0) does not support private fragments, so I’m not going to ignore them for this post.

The C++ standard does not define file extensions; this is toolchain specific. With GCC, the file name suffix determines how the file is treated for any given input file.

GCC interprets the following file extensions as C++ source code which must be preprocessed:

  • file.cc
  • file.cp
  • file.cxx
  • file.cpp
  • file.c++
  • file.C

We have always preferred the .cpp extension for C++ source files in our projects and when teaching C++. So to that end, in the following examples, I will use .cpp for regular C++ source files and .cxx for module files. This is nothing more than a personal preference.

Notably, Microsoft has chosen to use the extension .ixx for module interfaces (see link). We could use file.ixx but, with GCC, would need to use the -x c++ file.ixx directive to specify the file should be treated as a C++ file. Rather than use the additional complication, using .cxx means it will be treated as a standard C++ file by GCC.

To make the original file func.cpp into a module (func.cxx), we add the line

export module MODULE-NAME;

e.g.

// func.cxx
#include <iostream>

export module mod;

void func() {
    std::cout << "hello, world!\n";
}

However, this will not yet compile. The include statement needs to be in the global fragment. The global fragment must precede the main purview and is simply introduced using the keyword module, e.g.

// func.cxx
module;

#include <iostream>

export module mod;

void func() {
    std::cout << "hello, world!\n";
}

We can now import the module mod into main:

// main.cpp
import mod;

int main(){
    func();
}

Next we can compile func.cxx

$ g++ -c -std=c++20 -fmodules-ts func.cxx 

Note, in GCC C++20, modules are, currently, not enabled by just specifying c++20; you must also supply the directives-fmodules-ts.

As expected, the compilation generates an object file func.o. However, you will also notice that a subdirectory, gcm.cache is created, with the file mod.gcm. This is the generated module interface file used by the compile.

If we go ahead and compile main.cpp

$ g++ -c -std=c++20 -fmodules-ts  main.cpp 
main.cpp: In function 'int main()':
main.cpp:5:5: error: 'func' was not declared in this scope
    5 |     func();
      |     ^~~~

We get the error that func was not declared. If we tried to declare it in main.cpp (as before) it would build but fail to link.

So this gives us our first significant change:

In modules, all declarations and definitions are private unless exported

Officially they have Module Linkage, which differs from internal linkage. This only becomes apparent when using a multi-file module and partitions (cover in the follow-on post).

To fix this, we export the function, e.g.

// func.cxx
module;

#include <iostream>

export module mod;

export void func() {
    std::cout << "hello, world!\n";
}

The project now successfully compiles and links:

$ g++ -c -std=c++20 -fmodules-ts func.cxx
$ g++ -c -std=c++20 -fmodules-ts main.cpp 
$ g++ main.o func.o -o App
$ ./App 
hello, world!

One final detail, we still can separate out a declaration from a definition, e.g.

// func.cxx
module;

#include <iostream>

export module mod;

export void func();

void func() {
    std::cout << "hello, world!\n";
}

I’m not sure this is of much benefit, but I guess comes down to style.

Separate Interface and Implementation files

At some point, or because of personal preference, we may choose to split our module into multiple files to make it more manageable and help rebuild times.

Each file pertaining to a module is called a Module Unit. We are going to create two units:

  • A Primary Module Interface Unit (PMIU)
  • A Module Implementation Unit

Primary Module Interface Unit

Each named module must have one, and only one, Primary Module Interface Unit. This is the revised file func.cxx that contains the statement:

// func.cxx

export module MODULE-NAME;

and our other export statements, e.g.


export module mod;

export void func();

That’s all we need; we have named a module mod that exports a single function func. We can go ahead and compile this unit:

$ g++ -c -std=c++20 -fmodules-ts func.cxx

As before, this generates func.o and gmc.cache\mod.gmc.

Module Implementation Unit

There currently is no idiomatic naming conversion, so I have gone with func_impl.cxx, but it could be any filename/extension you prefer. You cannot use func.ixx as it will also generate an object file func.o which will overwrite that func.cxx generated object file.

An implementation unit contains the line:

module MODULE-NAME;

Note it does not have the export keyword. This implicitly makes anything declared/defined in the PMIU available in the implementation unit (the opposite is not true). Note, implementation units cannot have any export statements.

// func_impl.cxx
module;

#include <iostream>

module mod;

void func() {
    std::cout << "hello, world!\n";
}

And there we have it. This implementation unit can now be compiled:

$ g++ -c -std=c++20 -fmodules-ts func_impl.cxx 

Which generates func_impl.o – we have two module object files as part of our linkage, but of course, this also means the changes to the module implementation do not require a recompile of clients (mimicking the behaviour of included headers).

$ g++ main.o func.o func_impl.o -o App
$ ./App 
hello, world!

With GCC, the interface unit must be compiled before the implementation unit.

export

So far, we have only exported a single function. Assuming we have multiple functions, we want to export, the standard allows for several options.

Export per function

For each function we want to export, we simply prepend the export keyword to add it to the interface.

// func.cxx

export module mod;

export void func();
export void func(int);

Export block

Alternatively, we can group many declarations into an export block, e.g.

// func.cxx

export module mod;

export {
    void func();
    void func(int);
}

Namespace

As mentioned earlier, C++ namespaces are orthogonal to modules. Again, I will admit this initially also caused me some confusion. Not, in so much, as the syntax, more the general philosophy of using namespaces and modules; where each one fits in an architectural structure. This is possibly skewed by my early background in Ada, where the two (the package) very much align.

From a practical perspective, namespaces behave as before, so there is no real change to their use, e.g.

// func.cxx

export module mod;

namespace X {
    export void func();
    export void func(int);
}
// func_impl.cxx
module;

#include <iostream>

module mod;

namespace X {
    void func() {
        std::cout << "hello, world!\n";
    }

    void func(int p) {
        std::cout << "hello, " << p << '\n';
    }
}
// main.cpp
import mod;

int main(){
    X::func();
    X::func(42);
}

Export namespace

Alternatively, if we export a namespace, all declarations within that namespace are automatically included in the module’s interface, e.g.

// func.cxx

export module mod;

export namespace X {
    void func();
    void func(int);
}

Exporting Types, etc.

Anything we require as part of the module interface must be exported. For example, if function takes an object reference as a parameter, the normal type definition visibility rules apply, e.g.

// func.cxx
export module mod;

export class S {
public:
    S() = default;
    explicit S(int p):val{p}{}
    int get_val() const;
 private:
    int val{};
};

export void func(const S&);

Or

// func.cxx
export module mod;

export  {
    class S {
    public:
        S() = default;
        explicit S(int p):val{p}{}
        int get_val() const;
    private:
        int val{};
    };

    void func(const S&);
}
// func_impl.cxx
module;

#include <iostream>

module mod;  // implicitly import everything in PMIU

void func(const S& ptr) {
    std::cout << "hello, " << ptr.get_val() << '\n';
}

int S::get_val() const {
    return val;
}
// main.cpp
import mod;

int main(){
    S s{10};
    func(s);
}

There are a whole host of rules and exceptions regarding exporting items, such as templates and ADL evaluation. I’m not going to get into those here as it becomes very use-specific.

Includes

In the example, we can see we’ve used the traditional preprocessor directive #include to include the standard library header iostream. The standard permits the following:

import <iostream>;
import "header.h";

This is supported by GCC11, but there are some hoops you have to jump through to first make a user-defined header importable (see the -fmodule-header directive).

If you have a header-only file, e.g.

// header.h
#ifndef _HEADER_
#define _HEADER_

constexpr int life = 42;

#endif

and you want to import; you first need to compile it, e.g.

$ g++ -c -std=c++20 -fmodule-header header.h 

This generates a header.h.gcm file. The header can now be imported using the directive

import "header.h";

note the ;

In addition, Microsoft has already wrapped the standard library up in a module structure, so you may see the following:

import std.core

in Microsoft specific examples.

Summary

Hopefully, this will give you a feel for the foundations of C++20 modules and enough to go and experiment. I believe that the single-file and interface/implementation models will accommodate most people initial uses of modules.

In the follow-up post, I will cover the more complex ability to split a module’s interface into multiple files (called partitions). Using partitions, on the surface looks straightforward, but appears to open a pandoras’ box of fun and games.

My initial reaction to modules is that they are overly complex, but I put that down to my background and mental model having used module concepts in other languages. In addition, much of the coverage of modules does delve down into the complications of using partitions without laying out the basics.

I’m sure over the next couple of years, as support for modules improves, we’ll find an idiomatic approach to using modules; even if we can just get a consistent file naming convention.

In the deeply embedded space, we have only recently seen the release of GCC10 for Arm, so I can imagine it may be sometime before GCC11 can be used in our target project. Until then, I will continue to experiment with modules and partitions on the host.

The example code can be found here

Next post: C++20 Module Partitions

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值