Pimp My Pimpl
This is a translation of a two-part article that originally appeared in Heise Developer. You can find the originals here:
- Part One: http://www.heise.de/developer/artikel/C-Vor-und-Nachteile-des-d-Zeiger-Idioms-Teil-1-1097781.html
- Part Two: http://www.heise.de/developer/artikel/C-Vor-und-Nachteile-des-d-Zeiger-Idioms-Teil-2-1136104.html
You can find Part Two here:
Pimp My Pimpl
Much has been written about this funnily-named idiom, alternatively known as d-pointer, compiler firewall, or Cheshire Cat. Heise Developer highlights some angles of this practical construct that go beyond the classic technique.
Part One
This article first recapitulates the classic Pimpl idiom (pointer-to-implementation), points out its advantages, and goes on to develop further idioms on its basis. The second part will concentrate on how to mitigate the disadvantages that inevitably arise through Pimpl use.
The Classic Idiom
Every C++ programmer has probably stumbled across a class definition akin to the following:
1 2 3 4 5 6 |
|
Here, the programmer moved the data fields of his class Class
into a nested class Class::Private
. Instances of Class
merely contain a pointer d
to their Class::Private
instance.
To understand why the class author used this indirection, we need to take a step back and look at the C++ module system. In contrast to many other languages, C++, as a language of C descent, has no built-in support for modules (such support was proposed for C++0x, but did not make it into the final standard). Instead, one factors module function declarations (but not usually their definitions) into header files, and makes them available to other modules using the #include
preprocessor directive. This, however, leaves the header files filling a double role: On the one hand, they serve as the module interface. On the other, as a declaration site for potentially internal implementation details.
In times of C, this worked well: Implementation details of functions are encapsulated perfectly by the declaration/definition split; one could either merely forward-declare struct
s (in which case they were private), or define them directly in the header file (in which case they were public). In “object-oriented C”, class Class
from above would maybe look like the following:
1 2 3 4 5 6 |
|
Unfortunately, that doesn’t work in C++. Methods must be declared inside the class. Since classes without methods are rather boring, class definitions usually appear in C++ header files. Since classes, unlike namespaces, cannot be reopened, the header file must contain declarations for all (data fields, as well as) methods:
1 2 3 4 5 6 |
|
The problem is evident: The module interface (header file) necessarily contains implementation details; always a bad idea. That is why one uses a rather ugly trick and in short factors all implementation details (data fields as well as private methods) into a separate class:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
Since Class::Private
is used only in the declaration of a pointer variable, ie. “in name only” (Lakos) and not “in size”, a forward declaration suffices, as in the C case. All methods of Class
now access private methods and data members of Class::Private
through d
only.
In this way, one gains the convenience of a fully-encapsulating module system in C++, too. Because of the recourse to indirection, the developer pays for these benefits with an additional memory allocation (new Class::Private
), the indirection on accessing data fields and private methods, as well as the total waiving of (at least public) inline
methods. As the second part will show, the semantics of const
methods also change.
Before the second part of this article addresses the issue of how to rectify, or at least mitigate, the above downsides, the remainder of this article will first shed some light on the idiom’s benefits.
Benefits of the Pimpl Idiom
They are substantial. By encapsulating all implementation details, a slim and long-term stable interface (header file) arises. The former leads to more readable class definitions; the latter helps maintaining binary compatibility even through extensive changes to the implementation.
For instance, Nokia’s “Qt Development Frameworks” department (formerly Trolltech) has carried out profound changes to the widget rendering at least twice during the development of their “Qt 4” class library without the need to so much as relink programs using Qt 4.
Particularly in larger projects, the tendency of the Pimpl Idiom to dramatically speed up builds should not be underestimated. This is accomplished both by a reduction of #include
directives in header files and though the considerably reduced frequency of changes to header files of Pimpl classes in general. In “Exceptional C++”, Herb Sutter reports regular doubling of compilation speeds, John Lakos even claims build speed-ups of two orders of magnitude.
Another virtue of the design: classes with d-pointers are well-suited for transaction-oriented and exception-safe code, respectively. For instance, the developer may use the Copy-Swap Idiom (Sutter/Alexandrescu: C++ Coding Standards, Item 56) to create a transactional (all-or-nothing) copy assignment operator:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Implementation of C++0x move operations is trivial as well (and, in particular, identical across all Pimpl classes):
1 2 3 4 5 6 7 8 |
|
Both member swap and assignment operators may be implemented inline
in this model, without compromising the class’ encapsulation; developers should make good use of this fact.
Extended Means of Composition
As the last benefit the option to cut down on some of the extra dynamic memory allocations through direct aggregation of data fields should be mentioned. Without Pimpl, aggregation would customarily have been through a pointer in order to decouple classes from each other (a kind of Pimpl per data field). By “pimpling” the whole class once, the need to hold private data fields of complex type only though pointers can be dispensed with.
For instance, the idiomatic Qt dialog class
1 2 3 4 5 6 7 8 9 10 11 |
|
turns into
1 2 3 4 5 6 7 8 9 10 11 |
|
Qt aficionados may argue that the QDialog
destructor already destroys the child widgets; direct aggregation would therefore trigger a double-delete. Indeed, usage of this technique poses the threat of allocation sequence errors (double-delete, use-after-free, etc), particularly if data fields are also owned by the class, and vice versa. The transformation shown, however, is safe here, since Qt always allows to delete children before their parents.
This approach is especially effective in case data fields aggregated this way are themselves instances of “pimpled” classes. This is the case in the example shown, and usage of the Pimpl Idiom saves four dynamic memory allocations of size sizeof(void*)
while incurring only one additional (larger) allocation. This can lead to more efficient use of the heap, since small allocations regularly create especially high overhead in the allocator.
In addition, the compiler is much more likely to “de-virtualise” calls to virtual functions in this scenario, ie. it removes the double indirection caused by the virtuality of the function call. This requires interprocedural optimisation when using aggregation by pointer. Whether or not this indeed constitutes a win in runtime performance against the background of an extra indirection though the d-pointer has to be checked as needed by profiling concrete classes.
In case profiling shows that the dynamic memory allocation turns in to a bottleneck, the “Fast Pimpl” Idiom (Exceptional C++, Item 30) may produce relief. In this variant, a fast allocator, e.g. boost::singleton_pool
, is used to create Private
instances instead of global operator new()
.
Interim Findings
As a well-known C++ idiom, Pimpl allows class authors to separate class interface and implementation to an extent not directly provided for by C++. As a positive side-effect, the use of d-pointers speeds up compilation runs, eases implementation of transaction semantics, and allows, through extended means of composition, implementations that potentially are more runtime-efficient.
Not everything is shiny when using d-pointers, though: In addition to the extra Private
class, and its dynamic memory allocation, modified const
method semantics, as well as potential allocation sequence errors are cause for concern.
For some of these, the author will show solutions in the second part of the article. Complexity will increase further, though, so that for each concrete case one has to verify anew that the benefits of the idiom outweigh the downsides. If in doubt, this needs to be done per class in question. As usual, there can be no blanket judgements.
Outlook
The second and last part of this article will take a closer look under the hood of Pimpl, uncover the rust-streaked areas, and pimp the idiom using a whole array of accessories.