In recent columns forMSJ (June 1999), I've discussed COM type libraries and database access layers such as ActiveX® Data Objects (ADO) and OLE DB. Longtime readers of my MSJ writings (both of them) probably think I've gone soft. To redeem myself, this month I'll tour part of the Windows NT® loader code where the operating system and your code come together. I'll also demonstrate a nifty trick for getting loader status information from the loader, and a related trick you can use in the Developer Studio® debugger. Consider what you know about EXEs, DLLs, and how they're loaded and initialized. You probably know that when a C++ DLL is loaded, its DllMain function is called. Think about what happens when your EXE implicitly links to some set of DLLs (for example, KERNEL32.DLL and USER32.DLL). In what order will those DLLs be initialized? Is it possible for one of your DLLs to be initialized before another DLL that you depend on? The Platform SDK has this to say under the "Dynamic-Link Library Entry-Point Function" section.
Your function should perform only simple initialization tasks, such as setting up thread local storage (TLS), creating synchronization objects, and opening files. It must not call the LoadLibrary function, because this may create dependency loops in the DLL load order. This can result in a DLL being used before the system has executed its initialization code. Similarly, you must not call the FreeLibrary function in the entry-point function, because this can result in a DLL being used after the system has executed its termination code. Calling Win32® functions other than TLS, synchronization, and file functions may also result in problems that are difficult to diagnose. For example, calling User, Shell, and COM functions can cause access violation errors, because some functions in their DLLs call LoadLibrary to load other system components. |
Something I've learned firsthand is that the above documentation is still way too vague. For example, reading a registry key is a natural thing you'd want to do inside your DllMain function. It certainly qualifies as initialization. Unfortunately, in the right circumstances ADVAPI32.DLL isn't initialized before your DllMain code, and the registry APIs will just fail. Given the stern warning about using LoadLibrary in the documentation, it's especially interesting that the Windows NT USER32.DLL explicitly ignores the preceding advice. You may be aware of a Windows NT only registry key called AppInit_Dlls that loads a list of DLLs into each process. It turns out that the actual loading of these DLLs occurs as part of USER32's initialization. USER32 looks at this registry key and calls LoadLibrary for these DLLs in its DllMain code. A little thought here reveals that the AppInit_Dlls trick doesn't work if your app doesn't use USER32.DLL. But I digress. My point in bringing this up is that DLL loading and initialization is still a gray area. In most cases, a simplified view of how the OS loader works is sufficient. In those oddball 5 percent of cases, however, you can go nuts unless you have a more detailed working model of how the OS loader behaves.
Load 'er Up!
What most programmers think of as module loading is actually two distinct steps. Step one is to map the EXE or DLL into memory. As this occurs, the loader looks at the Import Address Table (IAT) of the module and determines whether the module depends on additional DLLs. If the DLLs aren't already loaded in that process, the loader maps them in as well. This procedure recurses until all of the dependent modules have been mapped into memory. A great way to see all the implicitly dependent DLLs for a given executable is the DEPENDS program from the Platform SDK. Step two of module loading is to initialize all of the DLLs. Stop and ponder this. While the OS loader is mapping the EXE and/or DLLs into memory in step one, it's not calling the initialization routines. The initialization routines are called after all the modules have been mapped into memory. Key point: the order in which DLLs are mapped into memory is not necessarily the same as the order in which the DLLs are initialized. I've seen people look at the DLL mapping notifications as they appear in the Developer Studio debugger and mistakenly assume that the DLLs were initialized in that same order. In Windows NT, the routine that invokes the entry point of EXEs and DLLs is called LdrpRunInitializeRoutines, and it's worth taking a look at here. In my own work, I've stepped through the assembler code for LdrpRunInitializeRoutines many times. However, looking at a ream of assembler code isn't the best way to understand it. Therefore, I rewrote LdrpRunInitializeRoutines from Windows NT 4.0 SP3 in C++-like pseudocode, with the results shown in Figure 1. To be completely accurate, in NTDLL.DBG the routine name is __stdcall mangled to _LdrpRunInitializeRoutines@4. Also, in my pseudocode, unless a variable or structure name is prefixed with an underscore, it was a name I made up. LdrpRunInitializeRoutines is the final stop in the Windows NT loader code before an EXE's or DLL's specified entry point is called. (In the following discussion, I'll use "entry point" and "initialization routine" interchangeably.) This loader code executes in the process context that loaded the DLL-that is, it's not part of some special loader process. LdrpRunInitializeRoutines is called at least once during process startup to handle implicitly loaded DLLs. LdrpRunInitializeRoutines is also called every time one or more DLLs is dynamically loaded, usually because of a call to LoadLibrary. Each time LdrpRunInitializeRoutines executes, it seeks out and calls the entry point of all DLLs that have been mapped into memory, but not yet initialized. In examining the pseudocode, take note of all the extra code that provides trace output, even in the nonchecked builds of Windows NT. I'm referring to all the code that uses the _ShowSnaps variable and the _DbgPrint function. I'll come back to these players later. At a high level, the function breaks up into four distinct sections. The first portion of the code calls _LdrpClearLoadInProgress. This NTDLL function returns the number of DLLs that have just been mapped into memory. For example, if you called LoadLibrary on FOO.DLL and FOO had implicit links to BAR.DLL and BAZ.DLL, _LdrpClearLoadInProgress would return 3 since three DLLs were mapped into memory. After the number of DLLs to be concerned with is known, LdrpRunInitializeRoutines calls _RtlAllocateHeap (also known as HeapAlloc) to get memory for an array of pointers. In the pseudocode I've called this array pInitNodeArray. Each pointer in pInitNodeArray will eventually point to a structure containing information about the newly loaded (but not yet initialized) DLL. In the second part of LdrpRunInitializeRoutines, the code digs into internal process data structures to obtain a linked list containing each of the newly loaded DLLs. As the code iterates through the linked list, it checks to see if the loader has somehow seen this DLL before (not likely). It also checks to ensure that the DLL has an entry point. If both tests are passed, the code appends the module information pointer to the pInitNodeArray. The pseudocode refers to the module information as pModuleLoaderInfo. Note that it's entirely possible for a DLL to not have an entry point-for example, a resource-only DLL. Thus, the number of entries in pInitNodeArray may be fewer than the value returned earlier by _LdrpClearLoadInProgress. The third (and largest) section of LdrpRunInitializeRoutines is where things really start to happen. The code's mission here is to enumerate through each element in pInitNodeArray and call the entry point. Because of the very real possibility that a DLL's initialization code may fault, the entire third section of code is surrounded by a __try block. This is why a dynamically loaded DLL can fault in its DllMain without bringing the whole process down. Iterating through an array and calling an entry point for each node should be a small task. However, some relatively obscure features of Windows NT add to the complexity. For starters, consider whether the process is being debugged by a Win32 debugger such as MSDEV.EXE. Windows NT has an option that allows you to suspend a process and send control to the debugger before a DLL is initialized. This feature is on a per-DLL basis, and is enabled by adding a string value (BreakOnDllLoad) to a registry key with the name of the DLL (for instance, FOO.DLL). See the pseudocode comment above the call to _LdrQueryImageFileExecutionOptions inFigure 1 for more information. Another bit of extra code that may execute before a DLL's entry point invocation is the TLS initialization. When you declare TLS variables using __declspec(thread), the linker includes data that causes this condition to be triggered. Right before the DLL's entry point is called, LdrpRunInitializeRoutines checks to see if a TLS initialization is necessary and, if so, calls _LdrpCallTlsInitializers. More on this later. The moment of truth finally comes when LdrpRunInitializeRoutines calls the DLL's entry point. I deliberately left this part of the pseudocode in assembly language. You'll see why later. The crucial instruction is CALL EDI. Here, EDI points to the DLL's entry point, which is specified in the DLL's PE header. When CALL EDI returns, the DLL in question has completed its initialization. For DLLs written in C++, this means that the DllMain code has executed its DLL_PROCESS_ATTACH code. Also, note the third parameter to the entry point, normally referred to as pvReserved. In truth, this parameter is nonzero for DLLs that the EXE implicitly links to directly or through another DLL. The third parameter is zero for all other DLLs (that is, DLLs loaded as a result of a LoadLibrary call). After the DLL entry point is invoked, LdrpRunInitializeRoutines does a sanity check to make sure the DLL entry point code was defined properly. The loader code looks at the stack pointer (ESP) value from before and after the entry point call. If they're different, something's wrong with the DLL's initialization function. Since most programmers never define the real DLL entry point function, this scenario rarely happens. However, when it does, you're informed of the problem via an onerous dialog (seeFigure 2). I had to use a debugger and modify a register value at just the right spot to produce this dialog. |