[Computer Security] AEG: Automatic Exploit Generation-优快云博客

本文链接：https://blog.youkuaiyun.com/ShadyPi/article/details/147018993

AEG: Automatic Exploit Generation

在这里插入图片描述
From the title we can know that AEG is a technique that automatically generate exploits of a given source code. Fig. 5 shows the workflow of AEG. In this blog we learn these steps one by one.

Pre-Process

Compile binary code $B_{gcc}$ and LLVM bytecode $B_{llvm}$ from source code. Notice that AEG is a two-input single-output system, which means it does not take source code as input, but it needs bytecode to do source analysis. Bytecode is a platform-independent, intermediate representation of code that is not executed directly by the hardware, but by a virtual machine (VM).

Source-Analysis

This step does not do complex analysis, instead it just finds out the largest buffer size in the program and output $ma x$ as the maximum size of symbolic data / exploits, which is at least 10% larger than the largest buffer size.

Bug-Find

Preconditioned Symbolic Execution

Traditional symbolic execution for bug finding is representing each byte of exploits with symbolic variables, turning it into a symbolic data. For every branch in the program, new constraints will be created to model different branches. In the end, we will try to solve a valid symbolic data satisfying all of the constraints in the interpreter / path.

if(a > 1)
   b = a - 1;
 else
   b = a + 1;

Taking the simple code above as an example, we will create interpreters to model different paths passing the if branch, like a>1 and a<=1. When we want the result b to be 2, we can set the function as a>1 & a-1=2, gaining a=3; and a<=1 & a+1=2, gaining a=1.

Symbolic execution can help attackers find potential exploits, but the search space can be extremely large, especially for loops because every iteration can produce new branches. To prune the branch, the authors introduced Preconditioned Symbolic Execution to add some constraints before trying to solve the symbolic execution, including

Know Length: To overflow a buffer, it is obvious that the input should exceed the length of buffer, so we do not need to consider inputs that are shorter than the length of buffer;
Known Prefix: Sometimes we know the prefix of input, e.g., a HTTP GET request always starts with “GET”;
Concolic Execution: Reuse a known input specified by a single program path.

Path Prioritization

Each branch will lead to 2 different paths, whose number will grow exponentially in loops. It is also a question to decide which path we will explore first. So, the authors proposed Path Prioritization to decide the order of exploration. This includes 2 main techniques:

Buggy-Path-First: One bug on a path means subsequent statements are also likely to be buggy (and hopefully exploitable), so they prioritize buggy paths higher and continues exploration.
Loop Exhaustion: The loop-exhaustion strategy gives higher priority to an interpreter exploring the maximum number of loop iterations, hoping that computations involving more iterations are more promising to produce bugs like buffer overflows. In this way, a loop only creates one new interpreter.

Environment Modelling

AS a practical application, AEG can attack in different environment settings, including Files, Sockets, Variables, Library Function Calls and System Calls.

Exploit Generation

After finding a path leading to a bug, we need to check if it is exploitable. Then, attackers need to generate the exploit and verify it can get a shell via this bug for the attacker.

DBA: Dynamic Binary Analysis

In Bug-Find step, we will gain paths constraints leading to a bug and names of vulnerable functions and buffers. Within them, attackers can keep reproducing the bug and observing the behavior of the program. That is what DBA does. During DBA, AEG performs instrumentation on the given executable binary $B_{gcc}$ . When it detects the vulnerable function call, it stops execution and examines the stack, recording stack memory contents.

To attack the program, attackers need to overwrite the content in the stack. However, inappropriate value may cause crashes. So, AEG will restore the contents in stack that aren’t needed, also making sure the program won’t crash during attack.

Exploit-Gen

With the runtime information gained from DBA, the AEG will try to generate exploits accordingly. There are multiple types of exploits, but the paper only presents 1 kind of algorithm to generate stack-overflow return-to-stack exploits, showing below:
在这里插入图片描述
The exp_str stores the expected contents in the stack, overwriting EIP (register pointing to the next instruction) and shellcode afterwards. The EIP is set to the next stack frame (offset+8), where contains the shellcode. Other stack contents between &retaddr and bufaddr (exp_str before offset) remain unchanged and will be restored during attack.