怎样使用Joern生成Program Dependence Graph并和源码中的行号对应起来

最新推荐文章于 2024-05-23 09:34:03 发布

原创

最新推荐文章于 2024-05-23 09:34:03 发布 · 5.2k 阅读

39 ·

CC 4.0 BY-SA版权

本文详细介绍如何使用Joern工具生成程序依赖图(PDG)，并通过实际案例解释如何从生成的PDG中获取节点对应的源代码行号，为后续源代码分析提供支持。

这个问题我已经想搞清楚很久了，不得不说，很多开发工具（或者说开源项目）的人，脑子都是一团浆糊，虽然标榜自己可用于科研，但是完全不知道科研人员的需求在哪，写出来的文档也是乱七八糟。

按照以前的Joern的文档，我根本没搞清楚怎么生成PDG，现在算是清楚了：https://docs.joern.io/exporting/

我们以一个别人论文里的例子来说明，例如我要生成下面这个Example.c文件的PDG：

int main(int argc, char **argv)
{
    char *items[] = {"boat", "car", "truck", "train"};
    int index = Untrusted();
    printf("You selected %s\n", items[index-1]);
    int upbound = sizeof(items) / sizeof(items[0]);
    printf("Last item %s\n", items[upbound - 1]);
}

这个文件是没法编译的，但是通过island grammar，Joern是可以分析它的，在安装好了Joern之后，我们运行：

./joern-parse Example.c
./joern-export --repr pdg --out Example

就可以生成一个Example的目录，并且在目录下面有0-pdg.dot，1-pdg.dot，2-pdg.dot这些文件，我们一般只关心第一个即可，可以看看这个文件长什么样：

digraph main {  
"1000100" [label = "(METHOD,main)" ]
"1000135" [label = "(METHOD_RETURN,int)" ]
"1000101" [label = "(PARAM,int argc)" ]
"1000102" [label = "(PARAM,char **argv)" ]
"1000105" [label = "(<operator>.assignment,*items[] = {\"boat\", \"car\", \"truck\", \"train\"})" ]
"1000108" [label = "(<operator>.assignment,index = Untrusted())" ]
"1000111" [label = "(printf,printf(\"You selected %s\n\", items[index-1]))" ]
"1000115" [label = "(<operator>.subtraction,index-1)" ]
"1000119" [label = "(<operator>.assignment,upbound = sizeof(items) / sizeof(items[0]))" ]
"1000121" [label = "(<operator>.division,sizeof(items) / sizeof(items[0]))" ]
"1000122" [label = "(<operator>.sizeOf,sizeof(items))" ]
"1000124" [label = "(<operator>.sizeOf,sizeof(items[0]))" ]
"1000128" [label = "(printf,printf(\"Last item %s\n\", items[upbound - 1]))" ]
"1000132" [label = "(<operator>.subtraction,upbound - 1)" ]
  "1000119" -> "1000135"  [ label = "DDG: sizeof(items) / sizeof(items[0])"] 
  "1000128" -> "1000135"  [ label = "DDG: items[upbound - 1]"] 
  "1000102" -> "1000135"  [ label = "DDG: argv"] 
  "1000124" -> "1000135"  [ label = "DDG: items[0]"] 
  "1000101" -> "1000135"  [ label = "DDG: argc"] 
  "1000111" -> "1000135"  [ label = "DDG: printf(\"You selected %s\n\", items[index-1])"] 
  "1000122" -> "1000135"  [ label = "DDG: items"] 
  "1000111" -> "1000135"  [ label = "DDG: items[index-1]"] 
  "1000128" -> "1000135"  [ label = "DDG: printf(\"Last item %s\n\", items[upbound - 1])"] 
  "1000108" -> "1000135"  [ label = "DDG: Untrusted()"] 
  "1000132" -> "1000135"  [ label = "DDG: upbound"] 
  "1000115" -> "1000135"  [ label = "DDG: index"] 
  "1000100" -> "1000101"  [ label = "DDG: "] 
  "1000100" -> "1000102"  [ label = "DDG: "] 
  "1000100" -> "1000105"  [ label = "DDG: "] 
  "1000100" -> "1000108"  [ label = "DDG: "] 
  "1000100" -> "1000111"  [ label = "DDG: "] 
  "1000105" -> "1000111"  [ label = "DDG: items"] 
  "1000108" -> "1000115"  [ label = "DDG: index"] 
  "1000100" -> "1000115"  [ label = "DDG: "] 
  "1000100" -> "1000119"  [ label = "DDG: "] 
  "1000100" -> "1000121"  [ label = "DDG: "] 
  "1000100" -> "1000122"  [ label = "DDG: "] 
  "1000100" -> "1000128"  [ label = "DDG: "] 
  "1000119" -> "1000132"  [ label = "DDG: upbound"] 
  "1000100" -> "1000132"  [ label = "DDG: "] 
}

如果我们运行：

dot -Tpng 0-pdg.dot -o 0-pdg.png

就可以生成下面这个图：

可以很明显地看到，这个图并不是以语句为节点单位的，而是更细粒度的类似于AST节点的node。但是，往往我们在分析源码的时候，需要和某一行对应起来（例如Diff是以一行作为单位的），那怎么办呢？抱歉，Joern的文档是不会给我们答案的。例如这里有人也提了这个问题：https://gitter.im/joern-code-analyzer/community

呵呵呵，但是并没有人回复他。所以得自己想想办法。

按照这里的介绍，我们是可以query所生成的CPG（Code Property Graph）的所有信息的：https://docs.joern.io/quickstart#querying-the-code-property-graph

例如我们可以通过：

cpg.method.name.l

来返回所分析代码中的所有function name，按道理说，看上面的0-pdg.dot文件，每一个节点都有一个唯一的编号，那我们应该能把其行号输出出来：

最方便起见，我们可以用：

cpg.all.l

可以看到是包含所有的节点行号的，但是似乎不太好分析，看到这里介绍：https://docs.joern.io/cpgql/reference-card#execution-directives

除了用.l之外，我们可以输出成Json，这样就好分析了（顺便吐槽一下这个.l，你前面都没简写，后面不能用个.list吗，非得用个.l，让人和.1傻傻分不清）：

cpg.all.toJsonPretty |> "Json.txt"

如果我们要批量分析很多个源码文件呢，那显然用这种interactive的方式就不行了，我们需要借助于：https://docs.joern.io/interpreter

例如我们保存一个test.sc文件：

@main def exec() = {
   loadCpg("cpg.bin")
   cpg.all.toJsonPretty |> "Temp.json"

最低0.47元/天解锁文章