Dotfuscator中字符串混淆算法

最新推荐文章于 2025-02-10 10:00:51 发布

moyumoyu

最新推荐文章于 2025-02-10 10:00:51 发布

阅读量1.5k

点赞数 1

分类专栏： c#

c# 专栏收录该内容

6 篇文章

订阅专栏

Dotfuscator中字符串混淆算法

代码混淆工具，像Dotfuscator、Xenocode Postbuild等，都有重要功能就是字符串混淆，说起来很轻巧很简单，那么它到底是什么呢，如何工作的呢？
本文以Dotfuscator 4.x为例，并制造一个简单的ConsoleApplication用来做小白鼠，以此窥探字符串混淆的一斑。一下是简单ConsoleApplication的代码：

using System;
2

namespace ConsoleApplication1
4

{
5

class Program
6

{
7

static void Main(string[] args)
8

{
9

Console.WriteLine("This is the unencrypted string.");
10

}
11

}
12

}

编译，然后使用Dotfuscator混淆——我使用的Dotfuscator是4.x Pro，你需要在Option Tab里面设置Disable String Encryption为No，再Input Tab 设置输入为上面工程的编译结果，在String Encryption Tab里勾选所有的项或者添加type为*和method为*的两条规则，然后编译，完成后就可以在输出目录里找到已经混淆过了的ConsoleApplication1.exe了，使用Reflector打开，可以看到代码如下：

private static void a( string [] A_0)
2

{
3

int num = 2;
4

Console.WriteLine(a("軙듛럝鏟싡跣闥죧黩蓫语탯蟱髳鏵雷駹軻蟽烿瘁愃戅⠇礉砋簍礏簑猓㠕", num));
5

}

一串乱码，同时还可以看到这里增加了一个叫a的方法，那么这个a到底是什么呢？Reflector报告如下：

/**/ /* private scope */ static string a( string A_0, int A_1)

{

// This item is obfuscated and can not be translated.

显然这段代码使用Control Flow混淆过了，如此只能从IL下手了：

.method privatescope hidebysig static string a( string A_0, int32 A_1) cil managed

{

.maxstack 8

.locals init (

[ 0 ] char [] chArray,

[ 1 ] int32 num,

[ 2 ] int32 num2,

[ 3 ] uint8 num3,

[ 4 ] uint8 num4)

L_0000: ldarg.0

L_0001: callvirt instance char [] [mscorlib]System.String::ToCharArray()

L_0006: stloc.0

L_0007: ldc.i4 0xe74d6d7

L_000c: ldarg.1

L_000d: add

L_000e: stloc.1

L_000f: ldc.i4.0

L_0010: dup

L_0011: ldc.i4.1

L_0012: blt.s L_0047

L_0014: dup

L_0015: stloc.2

L_0016: ldloc.0

L_0017: ldloc.2

L_0018: ldloc.0

L_0019: ldloc.2

L_001a: ldelem.i2

L_001b: dup

L_001c: ldc.i4 0xff

L_0021: and

L_0022: ldloc.1

L_0023: dup

L_0024: ldc.i4.1

L_0025: add

L_0026: stloc.1

L_0027: xor

L_0028: conv.u1

L_0029: stloc.3

L_002a: dup

L_002b: ldc.i4.8

L_002c: shr

L_002d: ldloc.1

L_002e: dup

L_002f: ldc.i4.1

L_0030: add

L_0031: stloc.1

L_0032: xor

L_0033: conv.u1

L_0034: stloc.s num4

L_0036: pop

L_0037: ldloc.s num4

L_0039: ldloc.3

L_003a: stloc.s num4

L_003c: stloc.3

L_003d: ldloc.s num4

L_003f: ldc.i4.8

L_0040: shl

L_0041: ldloc.3

L_0042: or

L_0043: conv.u2

L_0044: stelem.i2

L_0045: ldc.i4.1

L_0046: add

L_0047: dup

L_0048: ldloc.0

L_0049: ldlen

L_004a: conv.i4

L_004b: blt.s L_0014

L_004d: pop

L_004e: ldloc.0

L_004f: newobj instance void [mscorlib]System.String::.ctor( char [])

L_0054: call string [mscorlib]System.String::Intern( string )

L_0059: ret

}

这里我不想过多解释IL，毕竟不是介绍MSIL，如果你有兴趣，可以查阅MSDN、相关书籍或者 Google一下。
从IL代码来看，混淆逻辑使用了一个永远为true的条件（等效为if(0<1)），做了一次跳转，这才到真正的循环上，显然这里对string的每一个char进行遍历并处理，然后依次对char的高低位分别和参考量做异或运算，在交换高低位后做对高低位求或，其结果就是真实的字符串了。
总结整理了一下，算法如下：

static string GetString( string source, int salt)
2

{
3

int index = 0 ;
4

char [] data = source.ToCharArray();
5

salt += 0xe74d6d7 ; // This const data generated by dotfuscator
6

while (index < data.Length)
7

{
8

char key = data[index];
9

byte low = (byte)((key & ' \x00ff ' ) ^ salt++);
10

byte high = (byte)((key >> 8 ) ^ salt++);
11

data[index] = ( char )((low << 8 | high));
12

index++;
13

}
14

return string .Intern( new string (data));
15

}

由此可见，字符串混淆的代价是相当大的，对于商业应用来说，应该尽量避免，也就是说避免使用hard code字符串保存敏感信息。此外，显然以上字符串混淆只能阻碍静态逆向分析，因为在.NET所有的字符串对CLR Runtime Host都是透明的，如果hacker使用debugger或者类似ProcessExplorer之类的工具是很容易分析出字符串里的秘密的。

To be the apostrophe which changed “Impossible” into “I’m possible”
----------------------------------------------------
WinkingZhang's Blog ( http://winkingzhang.cnblogs.com)
GCDN(http://gcdn.grapecity.com/cs)