==Phrack Inc.==
Volume 0x0b, Issue 0x3e, Phile #0x0d of 0x10
|=--=[ Using Process Infection to Bypass Windows Software Firewalls ]=--=|
|=-----------------------------------------------------------------------=|
|=---------------------------=[ rattle ]=--------------------------------=|
-[0x00] :: Table Of Contents ---------------------------------------------
[0x01] introduction
[0x02] how software firewalls work
[0x03] process Infection without external .dll
[0x04] problems of implementation
[0x05] how to implement it
[0x06] limits of this implementation
[0x07] workaround: another infection method
[0x08] conclusion
[0x09] last words
[0x0A] references
[0x0B] injector source code
[0x0C] Tiny bypass source code
[0x0D] binaries (base64)
-[0x01] :: introduction --------------------------------------------------
This entire document refers to a feature of software firewalls
available for Windows OS, which is called "outbound detection".
This feature has nothing to do with the original idea of a
firewall, blocking incomming packets from the net: The outbound
detection mechanism is ment to protect the user from malicious
programs that run on his own computer - programs attempting to
communicate with a remote host on the Internet and thereby
leaking sensible information. In general, the outbound detection
controls the communication of local applications with the
Internet.
In a world with an increasing number of trojan horses, worms
and virii spreading in the wild, this is actually a very handy
feature and certainly, it is of good use. However, ever since
I know about software firewalls, I have been wondering whether
they could actually provide a certain level of security at all:
After all, they are just software supposed protect you against
other software, and this sounds like bad idea to me.
To make a long story short, this outbound detection can be
bypassed, and that's what will be discussed in this paper.
I moreover believe that if it is possible to bypass this one
restriction, it is somehow possible to bypass other restrictions
as well. Personal firewalls are software, trying to control
another piece of software. It should in any case be possible
to turn this around by 180 degrees, and create a piece of
software that controls the software firewall.
Also, how to achieve this in practice is part of the discussion
that will follow: I will not just keep on talking about abstract
theory. It will be explained and illustrated with sample source
code how to bypass a software firewall by injecting code to a
trusted process. It might be interesting to you that the method
of runtime process infection that will be presented and explained
does not require an external DLL - the bypass can be performed
by a stand-alone and tiny executable.
Thus, this paper is also about coding, especially Win32 coding.
To understand the sample code, you should be familiar with
Windows, the Win32 API and basic x86 Assembler. It would also be
good to know something about the PE format and related things,
but it is not necessary, as far as I can see. I will try to
explain everything else as precisely as possible.
Note: If you find numbers enclosed in normal brackets within
the document, these numbers are references to further sources.
See [0x0A] for more details.
-[0x02] :: how software firewalls work -----------------------------------
Of course, I can only speak about the software firewalls I have
seen and tested so far, but I am sure that these applications
are among the most widely used ones. Since all of them work in a
very similar way, I assume that the concept is a general concept
of software firewalls.
Almost every modern software firewall provides features that
simulate the behaviour of hardware firewalls by allowing the
user to block certain ports. I have not had a close look on
these features and once more I want to emphasize that breaking
these restrictions is outside the scope of this paper.
Another important feature of most personal firewalls is the
concept of giving privileges and different levels of trust to
different processes that run on the local machine to provide a
measure of outbound detection. Once a certain executable creates
a process attempting to access the network, the executable file
is checksummed by the software firewall and the user is prompted
whether or not he wants to trust the respective process.
To perform this task, the software firewall is most probably
installing kernel mode drivers and hooks to monitor and intercept
calls to low level networking routines provided by the Windows OS
core. Appropriately, the user can trust a process to connect() to
another host on the Internet, to listen() for connections or to
perform any other familiar networking task. The main point is: As
soon as the user gives trust to an executable, he also gives
trust to any process that has been created from that executable.
However, once we change the executable, its checksum would no
longer match and the firewall would be alerted.
So, we know that the firewall trusts a certain process as long as
the executable that created it remains the same. We also know that
in most cases, a user will trust his webbrowser and his email
client.
-[0x03] :: process Infection without external .dll -----------------------
The software firewall will only calculate and analyze the checksum
for an executable upon process creation. After the process has
been loaded into memory, it is assumed to remain the same until it
terminates.
And since I have already spoken about runtime process infection,
you certainly have guessed what will follow. If we cannot alter
the executable, we will directly go for the process and inject
our code to its memory, run it from there and bypass the firewall
restriction.
If this was a bit too fast for you, no problem. A process is
loaded into random access memory (RAM) by the Windows OS as soon
as a binary, executable file is executed. Simplified, a process
is a chunk of binary data that has been placed at a certain
address in memory. In fact, there is more to it. Windows does a
lot more than just writing binary data to some place in memory.
For making the following considerations, none of that should
bother you, though.
For all of you who are already familiar with means of runtime
process infection - I really dislike DLL injection for this
purpose, simply because there is definitely no option that could
be considered less elegant or less stealthy.
In practice, DLL injection means that the executable that
performs the bypass somehow carries the additional DLL it
requires. Not only does this heaviely increase the size of the
entire code, but this DLL also has to be written to HD on the
affected system to perform the bypass. And to be honest - if
you are really going to write some sort of program that needs
a working software firewall bypass, you exactly want to avoid
this sort of flaws. Therefore, the presented method of runtime
process infection will work completely without the need of any
external DLL and is written in pure x86 Assembly.
To sum it all up: All that is important to us now is the ability
to get access to a process' memory, copy our own code into that
memory and execute the code remotely in the context of that
process.
Sounds hard? Not at all. If you have a well-founded knowledge of
the Win32 API, you will also know that Windows gives a programmer
everything he needs to perform such a task. The most important
API call that comes to mind probably is CreateRemoteThread().
Quoting MSDN (1):
The CreateRemoteThread function creates a thread that
runs in the address space of another process.
HANDLE CreateRemoteThread(
HANDLE hProcess,
LPSECURITY_ATTRIBUTES lpThreadAttributes,
DWORD dwStackSize,
LPTHREAD_START_ROUTINE lpStartAddress,
LPVOID lpParameter,
DWORD dwCreationFlags,
LPDWORD lpThreadId
);
Great, we can execute code at a certain memory address inside
another process and we can even pass one DWORD of information as
a parameter to it. Moreover, we will need the following 2 API
calls:
VirtualAllocEx()
WriteProcessMemory()
they give us the power to inject our own arbitrary code to the
address space of another process - and once it is there, we will
create a thread remotely to execute it.
To sum everything up: We will create a binary executable that
carries the injection code as well as the code that has to be
injected in order to bypass the software firewall. Or, speaking
in high-level programming terms: We will create an exe file that
holds two functions, one to inject code to a trusted process
and one function to be injected.
-[0x04] :: problems of this implementation -------------------------------
It all sounds pretty easy now, but it actually is not. For
instance, you will barely be able to write an application in C
that properly injects another (static) C function to a remote
process. In fact, I can almost guarantee you that the remote
process will crash. Although you can call the relevant API calls
from C, there are much more underlying problems with using a
high level language for this purpose. The essence of all these
problems can be summed up as follows: compilers produce ASM code
that uses hardcoded offsets. A simple example: Whenever you use
a constant C string, this C string will be stored at a certain
position within the memory of your resulting executable, and any
reference to it will be hardcoded. This means, when your process
needs to pass the address of that string to a function, the
address will be completely hardcoded in the binary code of your
executable.
Consider:
void main() {
printf("Hello World");
return 0;
}
Assume that the string "Hello World" is stored at offset 0x28048
inside your executable. Moreover, the executable is known to
load at a base address of 0x00400000. In this case, the binary
code of your compiled and linked executable will somewhere refer
to the address 0x00428048 directly.
A disassembly of such a sample application, compiled with Visual
C++ 6, looks like this:
00401597 ...
00401598 push 0x00428048 ; the hello world string
0040159D call 0x004051e0 ; address of printf
0040159E ...
What is the problem with such a hardcoded address? If you stay
inside your own address space, there is no problem. However ...
once you move that code to another address space, all those
memory addresses will point to entirely different things. The
hello world string in my example is more than 0x20000 = 131072
bytes away from the actual program code. So, if you inject that
code to another process space, you would have to make sure that
at 0x00428048, there is a valid C string ... and even if there
was something like a C string, it would certainly not be
"Hello World". I guess you get the point.
This is just a simple example and does not even involve all the
problems that can occur. However, also the addresses of all
function calls are hardcoded, like the address of the printf
function in our sample. In another process space, these
functions might be somewhere else or they could even be missing
completely - and this leads to the most weird errors that you
can imagine. The only way to make sure that all the addresses
are correct and that every single CPU instruction fits, we have
to write the injected code in ASM.
Note: There are several working implementations for an outbound
detection bypass for software firewalls on the net using a
dynamic link library injection. This means, the implementation
itself consists of one executable and a DLL. The executable
forces a trusted process to load the DLL, and once it has been
loaded into the address space of this remote process, the DLL
itself performs any arbitrary networking task. This way to bypass
the detection works very well and it can be implemented in a high
level language easiely, but I dislike the dependency on an
external DLL, and therefore I decided to code a solution with one
single stand-alone executable that does the entire injection by
itself. Refer to (2) for an example of a DLL injection bypass.
Also, LSADUMP2 (3) uses exactly the same measure to grab
the LSA secrets from LSASS.EXE and it is written in C.
-[0x05] :: how to implement it -------------------------------------------
Until now, everything is just theory. In practice, you will
always encounter all kinds of problems when writing code like
this. Furthermore, you will have to deal with detail questions
that have only partially to do with the main problem. Thus,
let us leave the abstract part behind and think about how to
write some working code.
Note: I strongly recommend you to browse the source code in
[A] while reading this part, and it would most definitely be a
good idea to have a look at it before reading [0x0B].
First of all, we want to avoid as much hardcoded elements as
possible. And the first thing we need is the file path to the
user's default browser. Rather than generally refering to
"C:/Program Files/Internet Explorer/iexplore.exe", we will
query the registry key at "HKCR/htmlfile/shell/open/command".
Ok, this will be rather easy, I assume you know how to query
the registry. The next thing to do is calling CreateProcess().
The wShowWindow value of the STARTUP_INFO structure passed to
the function should be something like SW_HIDE in order to keep
the browser window hidden.
Note: If you want to make entirely sure that no window is
displayed on the user's screen, you should put more effort
into this. You could, for instance, install a hook to keep all
windows hidden that are created by the process or do similar
things. I have only tested my example with Internet Explorer
and the SW_HIDE trick works well with it. In fact, it should
work with most applications that have a more or less simple
graphical user interface.
To ensure that the process has already loaded the most
essential libraries and has reached a generally stable state,
we use the WaitForInputIdle() call to give the process some
time for intialization.
So far, so good - now we proceed by calling VirtualAllocEx()
to allocate memory within the created process and with
WriteProcessMemory(), we copy our networking code. Finally,
we use CreateRemoteThread() to run that code and then, we only
have to wait until the thread terminates. All in all, the
injection itself is not all that hard to perform.
The function that will be injected can receive a single
argument, one double word. In the example that will be
presented in [0x0B], the injected procedure connects to
www.phrack.org on port 80 and sends a simple HTTP GET request.
After receiving the header, it displays it in a message box.
Since this is just a very basic example of a working firewall
bypass code, our injected procedure will do everything on its
own and does not need any further information.
However, we will still use the parameter to pass a 32 bit
value to our injected procedure: its own "base address". Thus,
the injected code knows at which memory address it has been
placed, in the conetxt of the remote process. This is very
important as we cannot directly read from the EIP register
and because our injected code will sometimes have to refer to
memory addresses of data structures inside the injected code
itself.
Once injected and placed within the remote process, the
injected code basically knows nothing. The first important
task is finding the kernel32.dll base address in the context
of the remote process and from there, get the address of the
GetProcAddress function to load everything else we need. I
will not explain in detail how these values are retrieved,
the entire topic cannot be covered by this paper. If you are
interested in details, I recommend the paper about Win32
assembly components by the Last Stage of Delirium research
group (4). I used large parts of their write-up for the
code that will be described in the following paragraphs.
In simple terms, we retrieve the kernel32 base address from
the Process Environment Block (PEB) structure which itself
can be found inside the Thread Environment Block (TEB). The
offset of the TEB is always stored within the FS register,
thus we can easiely get the PEB offset as well. And since
we know where kernel32.dll has been loaded, we just need to
loop through its exports section to find the address of
GetProcAddress(). If you are not familiar with the PE format,
don't worry.
A dynamic link library contains a so-called exports section.
Within this section, the offsets of all exported functions
are assigned to human-readable names (strings). In fact,
there are two arrays inside this section that interest us.
There are actually more than 2 arrays inside the exports
section, but we will only use these two lists. For the rest
of this paper, I will treat the terms "list" and "array"
equally, the formal difference is of no importance at this
level of programming. One array is a list of standard,
null-terminated C-strings. They contain the function names.
The second list holds the function entry points (the
offsets).
We will do something very similar to what GetProcAddress()
itself does: We will look for "GetProcAddress" in the first
list and find the function's offset within the second array
this way.
Unfortunately, Microsoft came up with an idea for their DLL
exports that makes everything much more complicated. This
idea is named "forwarders" and basically means that one DLL
can forward the export of a function to another DLL. Instead
of pointing to the offset of a function's code inside the DLL,
the offset from the second array may also point to a null-
terminated string. For instance, the function HeapAlloc() from
kernel32.dll is forwarded to the RtlAllocateHeap function in
ntdll.dll. This means that the alleged offset of HeapAlloc()
in kernel32.dll will not be the offset of a function that has
been implemented in kernel32.dll, but it will actually be the
offset of a string that has been placed inside kernel32.dll.
This particular string is "NTDLL.RtlAllocateHeap".
After a while, I could figure out that this forwarder-string
is placed immediately after the function's name in array #1.
Thus, you will find this chunk of data somewhere inside
kernel32.dll:
48 65 61 70 41 6C 6C 6F HeapAllo
63 00 4E 54 44 4C 4C 2E c.NTDLL.
52 74 6C 41 6C 6C 6F 63 RtlAlloc
61 74 65 48 65 61 70 00 ateHeap.
= "HeapAlloc/0NTDLL.RtlAllocateHeap/0"
This is, of course, a bit confusing as there are now more null-
terminated strings in the first list than offsets in the second
list - every forwarder seems like a function name itself.
However, bearing this in mind, we can easiely take care of the
forwarders in our code.
To identify the "GetProcAddress" string, I also make use of a
hash function for short strings which is presented by LSD group
in their write-up (4). The hash function looks like this in C:
unsigned long hash(const char* strData) {
unsigned long hash = 0;
char* tChar = (char*) strData;
while (*tChar) hash = ((hash<<5)|(hash>>27))+*tChar++;
return hash;
}
The calculated hash for "GetProcAddr" is, 0x099C95590 and we
will search for a string in the exports section of kernel32.dll
that matches this string. Once we have the address of
GetProcAddress() and the base address of kernel32, we can
easiely load all other API calls and libraries we need. From
here, everything left to do is loading ws2_32.dll and using the
socket system calls from that library to do whatever we want.
Note: I'd suggest to read [0x0B] now.
-[0x06] :: limits of this implementation ---------------------------------
The sample code presented in this little paper will give you a
tiny executable that runs in RING3. I am certain that most
software firewalls contain kernel mode drivers with the ability
to perform more powerful tasks than this injector executable.
Therefore, the capabilities of the bypass code are obviously
limited. I have tested the bypass against several software
firewalls and got the following results:
Zone Alarm 4 vulnerable
Zone Alarm Pro 4 vulnerable
Sygate Pro 5.5 vulnerable
BlackIce 3.6 vulnerable
Tiny 5.0 immune
Tiny alerts the user that the injector executable spawns the
browser process, trying to access the network this way. It looks
like Tiny simply acts exactly like all the other software
firewalls do, but it is just more careful. Tiny also hooks API
calls like CreateProcess() and CreateRemoteThread() - thus, it
can protect its users from this kind of bypass.
Anyway, by the test results I obtained, I was even more
confirmed that software firewalls act as kernel mode drivers,
hooking API calls to monitor networking activity.
Thus, I have not presented a firewall bypass that works in 100%
of all possible cases. It is just an example, a proof for the
general possibility to perform a bypass.
-[0x07] :: workaround: another infection method --------------------------
Phrack Staff suggested to present a workaround for the problem
with Tiny by infecting an already running, trusted process.
I was certain that this would not be the only thing to take
care of, since Tiny would most likely be hooking our best friend,
CreateRemoteThread(). Unfortunately, I actually figured out that
I had been right, and merely infecting an already running
process did not work against Tiny.
However, there are other ways to force execution of our own
injected code, and I will briefly explain my workaround for
those of you who are interested. All I am trying to prove here
is that you can outsmart any software firewall if you put some
effort into coding an appropriate bypass.
The essential API calls we will need are GetThreadContext() and
appropriately, SetThreadContext(). These two briefly documented
functions allow you to modify the CONTEXT of a thread. What is
the CONTEXT of a thread? The CONTEXT structure contains the
current value of all CPU registers in the context of a certain
thread. Hence, with the two API calls mentioned above, you can
retrieve these values and, more importantly, apply new values
to each CPU register in the thread's context as well. Of high
interest to us is the EIP register, the instruction pointer for
a thread.
First of all, we will simply find an already running, trusted
process. Then, as always, we write our code to its memory using
the methods already discussed before. This time, however, we
will not create a new thread that starts at the address of our
injected code, we will rather hijack the primary thread of the
trusted process by changing its instruction pointer to the
address of our own code.
That's the essential theory behind this second bypass, at least.
In practice, we will proceed more cautiously to be as stealthy
as possible. First of all, we will not simply write the injection
function to the running process, but several other ASM codes as
well, in order to return to the original context of the hijacked
thread once our injected code has finished its work. As you can
see from the ASM source code in [0x0C], we want to copy a chunk
of shellcode to the process that looks like this in a debugger:
Volume 0x0b, Issue 0x3e, Phile #0x0d of 0x10
|=--=[ Using Process Infection to Bypass Windows Software Firewalls ]=--=|
|=-----------------------------------------------------------------------=|
|=---------------------------=[ rattle ]=--------------------------------=|
-[0x00] :: Table Of Contents ---------------------------------------------
[0x01] introduction
[0x02] how software firewalls work
[0x03] process Infection without external .dll
[0x04] problems of implementation
[0x05] how to implement it
[0x06] limits of this implementation
[0x07] workaround: another infection method
[0x08] conclusion
[0x09] last words
[0x0A] references
[0x0B] injector source code
[0x0C] Tiny bypass source code
[0x0D] binaries (base64)
-[0x01] :: introduction --------------------------------------------------
This entire document refers to a feature of software firewalls
available for Windows OS, which is called "outbound detection".
This feature has nothing to do with the original idea of a
firewall, blocking incomming packets from the net: The outbound
detection mechanism is ment to protect the user from malicious
programs that run on his own computer - programs attempting to
communicate with a remote host on the Internet and thereby
leaking sensible information. In general, the outbound detection
controls the communication of local applications with the
Internet.
In a world with an increasing number of trojan horses, worms
and virii spreading in the wild, this is actually a very handy
feature and certainly, it is of good use. However, ever since
I know about software firewalls, I have been wondering whether
they could actually provide a certain level of security at all:
After all, they are just software supposed protect you against
other software, and this sounds like bad idea to me.
To make a long story short, this outbound detection can be
bypassed, and that's what will be discussed in this paper.
I moreover believe that if it is possible to bypass this one
restriction, it is somehow possible to bypass other restrictions
as well. Personal firewalls are software, trying to control
another piece of software. It should in any case be possible
to turn this around by 180 degrees, and create a piece of
software that controls the software firewall.
Also, how to achieve this in practice is part of the discussion
that will follow: I will not just keep on talking about abstract
theory. It will be explained and illustrated with sample source
code how to bypass a software firewall by injecting code to a
trusted process. It might be interesting to you that the method
of runtime process infection that will be presented and explained
does not require an external DLL - the bypass can be performed
by a stand-alone and tiny executable.
Thus, this paper is also about coding, especially Win32 coding.
To understand the sample code, you should be familiar with
Windows, the Win32 API and basic x86 Assembler. It would also be
good to know something about the PE format and related things,
but it is not necessary, as far as I can see. I will try to
explain everything else as precisely as possible.
Note: If you find numbers enclosed in normal brackets within
the document, these numbers are references to further sources.
See [0x0A] for more details.
-[0x02] :: how software firewalls work -----------------------------------
Of course, I can only speak about the software firewalls I have
seen and tested so far, but I am sure that these applications
are among the most widely used ones. Since all of them work in a
very similar way, I assume that the concept is a general concept
of software firewalls.
Almost every modern software firewall provides features that
simulate the behaviour of hardware firewalls by allowing the
user to block certain ports. I have not had a close look on
these features and once more I want to emphasize that breaking
these restrictions is outside the scope of this paper.
Another important feature of most personal firewalls is the
concept of giving privileges and different levels of trust to
different processes that run on the local machine to provide a
measure of outbound detection. Once a certain executable creates
a process attempting to access the network, the executable file
is checksummed by the software firewall and the user is prompted
whether or not he wants to trust the respective process.
To perform this task, the software firewall is most probably
installing kernel mode drivers and hooks to monitor and intercept
calls to low level networking routines provided by the Windows OS
core. Appropriately, the user can trust a process to connect() to
another host on the Internet, to listen() for connections or to
perform any other familiar networking task. The main point is: As
soon as the user gives trust to an executable, he also gives
trust to any process that has been created from that executable.
However, once we change the executable, its checksum would no
longer match and the firewall would be alerted.
So, we know that the firewall trusts a certain process as long as
the executable that created it remains the same. We also know that
in most cases, a user will trust his webbrowser and his email
client.
-[0x03] :: process Infection without external .dll -----------------------
The software firewall will only calculate and analyze the checksum
for an executable upon process creation. After the process has
been loaded into memory, it is assumed to remain the same until it
terminates.
And since I have already spoken about runtime process infection,
you certainly have guessed what will follow. If we cannot alter
the executable, we will directly go for the process and inject
our code to its memory, run it from there and bypass the firewall
restriction.
If this was a bit too fast for you, no problem. A process is
loaded into random access memory (RAM) by the Windows OS as soon
as a binary, executable file is executed. Simplified, a process
is a chunk of binary data that has been placed at a certain
address in memory. In fact, there is more to it. Windows does a
lot more than just writing binary data to some place in memory.
For making the following considerations, none of that should
bother you, though.
For all of you who are already familiar with means of runtime
process infection - I really dislike DLL injection for this
purpose, simply because there is definitely no option that could
be considered less elegant or less stealthy.
In practice, DLL injection means that the executable that
performs the bypass somehow carries the additional DLL it
requires. Not only does this heaviely increase the size of the
entire code, but this DLL also has to be written to HD on the
affected system to perform the bypass. And to be honest - if
you are really going to write some sort of program that needs
a working software firewall bypass, you exactly want to avoid
this sort of flaws. Therefore, the presented method of runtime
process infection will work completely without the need of any
external DLL and is written in pure x86 Assembly.
To sum it all up: All that is important to us now is the ability
to get access to a process' memory, copy our own code into that
memory and execute the code remotely in the context of that
process.
Sounds hard? Not at all. If you have a well-founded knowledge of
the Win32 API, you will also know that Windows gives a programmer
everything he needs to perform such a task. The most important
API call that comes to mind probably is CreateRemoteThread().
Quoting MSDN (1):
The CreateRemoteThread function creates a thread that
runs in the address space of another process.
HANDLE CreateRemoteThread(
HANDLE hProcess,
LPSECURITY_ATTRIBUTES lpThreadAttributes,
DWORD dwStackSize,
LPTHREAD_START_ROUTINE lpStartAddress,
LPVOID lpParameter,
DWORD dwCreationFlags,
LPDWORD lpThreadId
);
Great, we can execute code at a certain memory address inside
another process and we can even pass one DWORD of information as
a parameter to it. Moreover, we will need the following 2 API
calls:
VirtualAllocEx()
WriteProcessMemory()
they give us the power to inject our own arbitrary code to the
address space of another process - and once it is there, we will
create a thread remotely to execute it.
To sum everything up: We will create a binary executable that
carries the injection code as well as the code that has to be
injected in order to bypass the software firewall. Or, speaking
in high-level programming terms: We will create an exe file that
holds two functions, one to inject code to a trusted process
and one function to be injected.
-[0x04] :: problems of this implementation -------------------------------
It all sounds pretty easy now, but it actually is not. For
instance, you will barely be able to write an application in C
that properly injects another (static) C function to a remote
process. In fact, I can almost guarantee you that the remote
process will crash. Although you can call the relevant API calls
from C, there are much more underlying problems with using a
high level language for this purpose. The essence of all these
problems can be summed up as follows: compilers produce ASM code
that uses hardcoded offsets. A simple example: Whenever you use
a constant C string, this C string will be stored at a certain
position within the memory of your resulting executable, and any
reference to it will be hardcoded. This means, when your process
needs to pass the address of that string to a function, the
address will be completely hardcoded in the binary code of your
executable.
Consider:
void main() {
printf("Hello World");
return 0;
}
Assume that the string "Hello World" is stored at offset 0x28048
inside your executable. Moreover, the executable is known to
load at a base address of 0x00400000. In this case, the binary
code of your compiled and linked executable will somewhere refer
to the address 0x00428048 directly.
A disassembly of such a sample application, compiled with Visual
C++ 6, looks like this:
00401597 ...
00401598 push 0x00428048 ; the hello world string
0040159D call 0x004051e0 ; address of printf
0040159E ...
What is the problem with such a hardcoded address? If you stay
inside your own address space, there is no problem. However ...
once you move that code to another address space, all those
memory addresses will point to entirely different things. The
hello world string in my example is more than 0x20000 = 131072
bytes away from the actual program code. So, if you inject that
code to another process space, you would have to make sure that
at 0x00428048, there is a valid C string ... and even if there
was something like a C string, it would certainly not be
"Hello World". I guess you get the point.
This is just a simple example and does not even involve all the
problems that can occur. However, also the addresses of all
function calls are hardcoded, like the address of the printf
function in our sample. In another process space, these
functions might be somewhere else or they could even be missing
completely - and this leads to the most weird errors that you
can imagine. The only way to make sure that all the addresses
are correct and that every single CPU instruction fits, we have
to write the injected code in ASM.
Note: There are several working implementations for an outbound
detection bypass for software firewalls on the net using a
dynamic link library injection. This means, the implementation
itself consists of one executable and a DLL. The executable
forces a trusted process to load the DLL, and once it has been
loaded into the address space of this remote process, the DLL
itself performs any arbitrary networking task. This way to bypass
the detection works very well and it can be implemented in a high
level language easiely, but I dislike the dependency on an
external DLL, and therefore I decided to code a solution with one
single stand-alone executable that does the entire injection by
itself. Refer to (2) for an example of a DLL injection bypass.
Also, LSADUMP2 (3) uses exactly the same measure to grab
the LSA secrets from LSASS.EXE and it is written in C.
-[0x05] :: how to implement it -------------------------------------------
Until now, everything is just theory. In practice, you will
always encounter all kinds of problems when writing code like
this. Furthermore, you will have to deal with detail questions
that have only partially to do with the main problem. Thus,
let us leave the abstract part behind and think about how to
write some working code.
Note: I strongly recommend you to browse the source code in
[A] while reading this part, and it would most definitely be a
good idea to have a look at it before reading [0x0B].
First of all, we want to avoid as much hardcoded elements as
possible. And the first thing we need is the file path to the
user's default browser. Rather than generally refering to
"C:/Program Files/Internet Explorer/iexplore.exe", we will
query the registry key at "HKCR/htmlfile/shell/open/command".
Ok, this will be rather easy, I assume you know how to query
the registry. The next thing to do is calling CreateProcess().
The wShowWindow value of the STARTUP_INFO structure passed to
the function should be something like SW_HIDE in order to keep
the browser window hidden.
Note: If you want to make entirely sure that no window is
displayed on the user's screen, you should put more effort
into this. You could, for instance, install a hook to keep all
windows hidden that are created by the process or do similar
things. I have only tested my example with Internet Explorer
and the SW_HIDE trick works well with it. In fact, it should
work with most applications that have a more or less simple
graphical user interface.
To ensure that the process has already loaded the most
essential libraries and has reached a generally stable state,
we use the WaitForInputIdle() call to give the process some
time for intialization.
So far, so good - now we proceed by calling VirtualAllocEx()
to allocate memory within the created process and with
WriteProcessMemory(), we copy our networking code. Finally,
we use CreateRemoteThread() to run that code and then, we only
have to wait until the thread terminates. All in all, the
injection itself is not all that hard to perform.
The function that will be injected can receive a single
argument, one double word. In the example that will be
presented in [0x0B], the injected procedure connects to
www.phrack.org on port 80 and sends a simple HTTP GET request.
After receiving the header, it displays it in a message box.
Since this is just a very basic example of a working firewall
bypass code, our injected procedure will do everything on its
own and does not need any further information.
However, we will still use the parameter to pass a 32 bit
value to our injected procedure: its own "base address". Thus,
the injected code knows at which memory address it has been
placed, in the conetxt of the remote process. This is very
important as we cannot directly read from the EIP register
and because our injected code will sometimes have to refer to
memory addresses of data structures inside the injected code
itself.
Once injected and placed within the remote process, the
injected code basically knows nothing. The first important
task is finding the kernel32.dll base address in the context
of the remote process and from there, get the address of the
GetProcAddress function to load everything else we need. I
will not explain in detail how these values are retrieved,
the entire topic cannot be covered by this paper. If you are
interested in details, I recommend the paper about Win32
assembly components by the Last Stage of Delirium research
group (4). I used large parts of their write-up for the
code that will be described in the following paragraphs.
In simple terms, we retrieve the kernel32 base address from
the Process Environment Block (PEB) structure which itself
can be found inside the Thread Environment Block (TEB). The
offset of the TEB is always stored within the FS register,
thus we can easiely get the PEB offset as well. And since
we know where kernel32.dll has been loaded, we just need to
loop through its exports section to find the address of
GetProcAddress(). If you are not familiar with the PE format,
don't worry.
A dynamic link library contains a so-called exports section.
Within this section, the offsets of all exported functions
are assigned to human-readable names (strings). In fact,
there are two arrays inside this section that interest us.
There are actually more than 2 arrays inside the exports
section, but we will only use these two lists. For the rest
of this paper, I will treat the terms "list" and "array"
equally, the formal difference is of no importance at this
level of programming. One array is a list of standard,
null-terminated C-strings. They contain the function names.
The second list holds the function entry points (the
offsets).
We will do something very similar to what GetProcAddress()
itself does: We will look for "GetProcAddress" in the first
list and find the function's offset within the second array
this way.
Unfortunately, Microsoft came up with an idea for their DLL
exports that makes everything much more complicated. This
idea is named "forwarders" and basically means that one DLL
can forward the export of a function to another DLL. Instead
of pointing to the offset of a function's code inside the DLL,
the offset from the second array may also point to a null-
terminated string. For instance, the function HeapAlloc() from
kernel32.dll is forwarded to the RtlAllocateHeap function in
ntdll.dll. This means that the alleged offset of HeapAlloc()
in kernel32.dll will not be the offset of a function that has
been implemented in kernel32.dll, but it will actually be the
offset of a string that has been placed inside kernel32.dll.
This particular string is "NTDLL.RtlAllocateHeap".
After a while, I could figure out that this forwarder-string
is placed immediately after the function's name in array #1.
Thus, you will find this chunk of data somewhere inside
kernel32.dll:
48 65 61 70 41 6C 6C 6F HeapAllo
63 00 4E 54 44 4C 4C 2E c.NTDLL.
52 74 6C 41 6C 6C 6F 63 RtlAlloc
61 74 65 48 65 61 70 00 ateHeap.
= "HeapAlloc/0NTDLL.RtlAllocateHeap/0"
This is, of course, a bit confusing as there are now more null-
terminated strings in the first list than offsets in the second
list - every forwarder seems like a function name itself.
However, bearing this in mind, we can easiely take care of the
forwarders in our code.
To identify the "GetProcAddress" string, I also make use of a
hash function for short strings which is presented by LSD group
in their write-up (4). The hash function looks like this in C:
unsigned long hash(const char* strData) {
unsigned long hash = 0;
char* tChar = (char*) strData;
while (*tChar) hash = ((hash<<5)|(hash>>27))+*tChar++;
return hash;
}
The calculated hash for "GetProcAddr" is, 0x099C95590 and we
will search for a string in the exports section of kernel32.dll
that matches this string. Once we have the address of
GetProcAddress() and the base address of kernel32, we can
easiely load all other API calls and libraries we need. From
here, everything left to do is loading ws2_32.dll and using the
socket system calls from that library to do whatever we want.
Note: I'd suggest to read [0x0B] now.
-[0x06] :: limits of this implementation ---------------------------------
The sample code presented in this little paper will give you a
tiny executable that runs in RING3. I am certain that most
software firewalls contain kernel mode drivers with the ability
to perform more powerful tasks than this injector executable.
Therefore, the capabilities of the bypass code are obviously
limited. I have tested the bypass against several software
firewalls and got the following results:
Zone Alarm 4 vulnerable
Zone Alarm Pro 4 vulnerable
Sygate Pro 5.5 vulnerable
BlackIce 3.6 vulnerable
Tiny 5.0 immune
Tiny alerts the user that the injector executable spawns the
browser process, trying to access the network this way. It looks
like Tiny simply acts exactly like all the other software
firewalls do, but it is just more careful. Tiny also hooks API
calls like CreateProcess() and CreateRemoteThread() - thus, it
can protect its users from this kind of bypass.
Anyway, by the test results I obtained, I was even more
confirmed that software firewalls act as kernel mode drivers,
hooking API calls to monitor networking activity.
Thus, I have not presented a firewall bypass that works in 100%
of all possible cases. It is just an example, a proof for the
general possibility to perform a bypass.
-[0x07] :: workaround: another infection method --------------------------
Phrack Staff suggested to present a workaround for the problem
with Tiny by infecting an already running, trusted process.
I was certain that this would not be the only thing to take
care of, since Tiny would most likely be hooking our best friend,
CreateRemoteThread(). Unfortunately, I actually figured out that
I had been right, and merely infecting an already running
process did not work against Tiny.
However, there are other ways to force execution of our own
injected code, and I will briefly explain my workaround for
those of you who are interested. All I am trying to prove here
is that you can outsmart any software firewall if you put some
effort into coding an appropriate bypass.
The essential API calls we will need are GetThreadContext() and
appropriately, SetThreadContext(). These two briefly documented
functions allow you to modify the CONTEXT of a thread. What is
the CONTEXT of a thread? The CONTEXT structure contains the
current value of all CPU registers in the context of a certain
thread. Hence, with the two API calls mentioned above, you can
retrieve these values and, more importantly, apply new values
to each CPU register in the thread's context as well. Of high
interest to us is the EIP register, the instruction pointer for
a thread.
First of all, we will simply find an already running, trusted
process. Then, as always, we write our code to its memory using
the methods already discussed before. This time, however, we
will not create a new thread that starts at the address of our
injected code, we will rather hijack the primary thread of the
trusted process by changing its instruction pointer to the
address of our own code.
That's the essential theory behind this second bypass, at least.
In practice, we will proceed more cautiously to be as stealthy
as possible. First of all, we will not simply write the injection
function to the running process, but several other ASM codes as
well, in order to return to the original context of the hijacked
thread once our injected code has finished its work. As you can
see from the ASM source code in [0x0C], we want to copy a chunk
of shellcode to the process that looks like this in a debugger: