after read mono

ename

于 2004-12-06 09:34:00 发布

阅读量881

点赞数

分类专栏： Mono 文章标签： token compiler file tokenize semantic string

本文链接：https://blog.youkuaiyun.com/ename/article/details/205930

版权

Mono 专栏收录该内容

1 篇文章

订阅专栏

本文介绍了使用cs - tokenizer.cs中的token()方法来了解C#编译器的工作原理。阐述了tokenize_file函数，编译器用表达式替换通用解析器，记录token位置用于语义分析报错。还说明了Locations的编码方式，以及tokenizer对字符串、数字等的处理。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

use the token () method from the cs-tokenizer.cs , i can recognize how the csharp compiler to work. follow this,that is the most important part of the driver : static void tokenize_file (SourceFile file) firstly ,the mcs used some expression to replace the general parser which is got from book.(like as " if ( is_identifier || is_identifier_numeric){...} "); Each time a token is returned, the location for the token is recorded into the `Location' property, that can be accessed by the parser. The parser retrieves the Location properties as it builds its internal representation to allow the semantic analysis phase to produce error messages that can pin point the location of the problem. Some tokens have values associated with it, for example when the tokenizer encounters a string, it will return a LITERAL_STRING token, and the actual string parsed will be available in the `Value' property of the tokenizer. The same mechanism is used to return integers and floating point numbers. //

//i can not understand that why design the location. //

** Locations Locations are encoded as a 32-bit number (the Location struct) that map each input source line to a linear number. As new files are parsed, the Location manager is informed of the new file, to allow it to map back from an int constant to a file + line number. Prior to parsing/tokenizing any source files, the compiler generates a list of all the source files and then reserves the low N bits of the location to hold the source file, where N is large enough to hold at least twice as many source files as were specified on the command line (to allow for a #line in each file). The upper 32-N bits are the line number in that file. The token 0 is reserved for ``anonymous'' locations, ie. if we don't know the location (Location.Null). The tokenizer also tracks the column number for a token, but this is currently not being used or encoded. It could probably be encoded in the low 9 bits, allowing for columns from 1 to 512 to be encoded.