目标: 将文件名和行号信息存在Token中以便词法分析和语法分析时输出更详细的信息。这在调试你的分析器时将会有非常大帮助。
做法: 记得之前 Boost.Spirit 有一个 file_iterator类和position_iterator类,仔细看了一下,确实满足 lexertl match_results类 对迭代器的要求。 好,那就写几行代码验证一下吧。
#include "lexertl/generator.hpp" #include "lexertl/lookup.hpp" #include "lexertl/rules.hpp" #include "lexertl/state_machine.hpp" #include <boost/spirit/home/classic/iterator/file_iterator.hpp> #include <boost/spirit/home/classic/iterator/position_iterator.hpp> #include <iostream> #include <string> //直接使用boost.spirit中定义的file_iterator和 position_iterator namespace SPIRIT_CLASSIC = boost::spirit::classic; typedef SPIRIT_CLASSIC::file_iterator<char> file_iterator_type; typedef SPIRIT_CLASSIC::position_iterator2<file_iterator_type> position_iterator_type; int main() { try { lexertl::rules rules_; lexertl::state_machine state_machine_; rules_.add ("[0-9]+", 1); rules_.add ("[a-zA-Z]+", 2); lexertl::generator::build (rules_, state_machine_); //将文件名作为参数传入到file_iterator中。 file_iterator_type iterFile("test.txt"); if ( !iterFile ) { std::cout<<"Open file test.txt fail!"<<std::endl; return (-1); } //lexertl.lookup 要求输入两个迭代器参数,作为输入的起始和结束。 //我们构造两个迭代器不仅包括输入文件内容还包含了行列信息。 position_iterator_type iterBegin( iterFile, iterFile.make_end() ); //迭代器起始位置 position_iterator_type iterEnd; //迭代器结束位置 //剩下的事就交给 lexertl 处理吧 lexertl::match_results<position_iterator_type> results_ (iterBegin, iterEnd); std::cout<<"Start parse file test.txt"<<std::endl; do { //输出token信息 lexertl::lookup(state_machine_, results_); SPIRIT_CLASSIC::file_position posStart = results_.start.get_position(); SPIRIT_CLASSIC::file_position posEnd = results_.end.get_position(); std::cout <<"Token Id : "<<results_.id<<std::endl <<"Token String : "<<std::string (results_.start, results_.end)<<std::endl <<"Token Position : ("<<posStart.line<<"."<<posStart.column<<" -> "<<posEnd.line<<"."<<posEnd.column<<")/n" <<std::endl; } while (results_.id != rules_.eoi()); } catch(const std::exception & e) { std::cout<<"<Error> Exception: "<<e.what()<<std::endl; } return 0; }
test.txt 文件 内容为: abcd1234TTTT
运行结果如下:
可以看到,已经正确地解析出了3个token,并且输出起始行列号与介绍行列号信息。
lexert 作者 Ben Hanson 似乎正准备自己为lexetl定义一个file_iterator 用于取代Boost.Spirit中 file_iterator。 这里我将Ben Hanson的Blog拷贝了过来。 如果真的另外开发一个file_iterator,我们期待在编译速度以及运行性能上能够超过Boost.Spirit中file_iterator……
The lexertl Blog
29.09.2009
As I have recently started a revamp of lexertl I have decided to start a blog to keep everybody up to date. As this version is not feature complete yet, I have added a separate zip file which you can find here.
So far I have implemented the following improvements:
- Auto compression of
wchar_t
based state machines (overridable). - A generic lookup mechanism based around iterators.
- Added the
lexertl::skip
token constant. - Removed regex macro length limitation.
- Made the BOL (
^
) link a singleton (as it can only occur at the beginning of a token). debug::dump()
now compresses ranges.
This dramatically reduces the list of (easier) features I wanted to add and just leaves the following for the immediate future:
file_iterator
(this will also replace the one inBoost.Spirit
)- Turn
size_t
into a templated type for state machine creation. - Re-write the code generator.
- Redo serialisation.