[Computer Security] AEG: Automatic Exploit Generation

AEG: Automatic Exploit Generation

在这里插入图片描述
From the title we can know that AEG is a technique that automatically generate exploits of a given source code. Fig. 5 shows the workflow of AEG. In this blog we learn these steps one by one.

Pre-Process

Compile binary code BgccB_{gcc}Bgcc and LLVM bytecode BllvmB_{llvm}Bllvm from source code. Notice that AEG is a two-input single-output system, which means it does not take source code as input, but it needs bytecode to do source analysis. Bytecode is a platform-independent, intermediate representation of code that is not executed directly by the hardware, but by a virtual machine (VM).

Source-Analysis

This step does not do complex analysis, instead it just finds out the largest buffer size in the program and output maxmaxmax as the maximum size of symbolic data / exploits, which is at least 10% larger than the largest buffer size.

Bug-Find

Preconditioned Symbolic Execution

Traditional symbolic execution for bug finding is representing each byte of exploits with symbolic variables, turning it into a symbolic data. For every branch in the program, new constraints will be created to model different branches. In the end, we will try to solve a valid symbolic data satisfying all of the constraints in the interpreter / path.

if(a > 1)
   b = a - 1;
 else
   b = a + 1;

Taking the simple code above as an example, we will create interpreters to model different paths passing the if branch, like a>1 and a<=1. When we want the result b to be 2, we can set the function as a>1 & a-1=2, gaining a=3; and a<=1 & a+1=2, gaining a=1.

Symbolic execution can help attackers find potential exploits, but the search space can be extremely large, especially for loops because every iteration can produce new branches. To prune the branch, the authors introduced Preconditioned Symbolic Execution to add some constraints before trying to solve the symbolic execution, including

  1. Know Length: To overflow a buffer, it is obvious that the input should exceed the length of buffer, so we do not need to consider inputs that are shorter than the length of buffer;
  2. Known Prefix: Sometimes we know the prefix of input, e.g., a HTTP GET request always starts with “GET”;
  3. Concolic Execution: Reuse a known input specified by a single program path.

Path Prioritization

Each branch will lead to 2 different paths, whose number will grow exponentially in loops. It is also a question to decide which path we will explore first. So, the authors proposed Path Prioritization to decide the order of exploration. This includes 2 main techniques:

  1. Buggy-Path-First: One bug on a path means subsequent statements are also likely to be buggy (and hopefully exploitable), so they prioritize buggy paths higher and continues exploration.
  2. Loop Exhaustion: The loop-exhaustion strategy gives higher priority to an interpreter exploring the maximum number of loop iterations, hoping that computations involving more iterations are more promising to produce bugs like buffer overflows. In this way, a loop only creates one new interpreter.

Environment Modelling

AS a practical application, AEG can attack in different environment settings, including Files, Sockets, Variables, Library Function Calls and System Calls.

Exploit Generation

After finding a path leading to a bug, we need to check if it is exploitable. Then, attackers need to generate the exploit and verify it can get a shell via this bug for the attacker.

DBA: Dynamic Binary Analysis

In Bug-Find step, we will gain paths constraints leading to a bug and names of vulnerable functions and buffers. Within them, attackers can keep reproducing the bug and observing the behavior of the program. That is what DBA does. During DBA, AEG performs instrumentation on the given executable binary BgccB_{gcc}Bgcc. When it detects the vulnerable function call, it stops execution and examines the stack, recording stack memory contents.

To attack the program, attackers need to overwrite the content in the stack. However, inappropriate value may cause crashes. So, AEG will restore the contents in stack that aren’t needed, also making sure the program won’t crash during attack.

Exploit-Gen

With the runtime information gained from DBA, the AEG will try to generate exploits accordingly. There are multiple types of exploits, but the paper only presents 1 kind of algorithm to generate stack-overflow return-to-stack exploits, showing below:
在这里插入图片描述
The exp_str stores the expected contents in the stack, overwriting EIP (register pointing to the next instruction) and shellcode afterwards. The EIP is set to the next stack frame (offset+8), where contains the shellcode. Other stack contents between &retaddr and bufaddr (exp_str before offset) remain unchanged and will be restored during attack.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

ShadyPi

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值