决战 ANTLR3

最新推荐文章于 2025-11-17 01:04:39 发布

最新推荐文章于 2025-11-17 01:04:39 发布 · 817 阅读

文章标签：

#正则表达式 #SQL #软件测试 #QQ #框架

JAVA 专栏收录该内容

0 篇文章

订阅专栏

　　前不久写完了一个工具软件Nova Studio用来开发自己编写的Java Web 框架Nova，其中有SQL编辑框，虽然通过Java正则表达式实现了现在流行的代码自动提示功能，但感觉过于硬编码，于是想通过语法分析的方法重新实现一下。
　　网上搜了搜，找到了语法生成器antlr，去官方网站下载了antlr3，找了相应的例子和文章熟悉了了一下，开始实践Calculator小例子，中间出过不少错误，主要是antlr3和antlr2有些地方不太一样，总算最后都一一解决了。
　　实践结果有点让人泄气，antlr3只能分析输入的文本是否语法正确，却不能告诉我后续应该跟随操作数还是运算符，另外即使语法正确时它也不能像antlr2那样给我一棵嵌套的语法树，它返回的是所有的叶节点数组，是一维的，这样我就没法通过遍历嵌套树实现不同语义不同的显示风格，如：SQL保留字黑体，表名红色等。
　　难道几天研究实践的努力白费了，实在是心有不甘，研究了一下antlr3生成的语法分析类CalculatorParser，发现它在向后匹配时会调用这个方法：

match(input,OR,FOLLOW_OR_in_logic_expr111);

　　FOLLOW_OR_in_logic_expr111这个follow参数正是'or' 逻辑或操作符后面可以跟随的各种标记(Token)，灵机一动，如果重载这个方法把这个follow保存起来，那么捕获到语法错后就能知道正确的后续标记了。行动！
　　CalculatorParser 是继承org.antlr.runtime.Parser类的，于是首先写一个自己的扩展类 MyAntlrParser：

    public class NovaAntlrParser extends MyAntlrParser
    {
        public BitSet follow = null;
        public int ttype; // 当前处理的token的数组下标

        public MyAntlrParser(TokenStream input)
        {
            super(input);
        }

        public void match(IntStream input, int ttype, BitSet follow)
        throws RecognitionException
        {
            this.ttype = ttype;
            this.follow = follow;

            super.match(input, ttype, follow);
        }		
    }

　　CalculatorParser类改成继承 MyAntlrParser类，在测试类Main中捕获异常进行处理，代码如下：

    import java.io.IOException;
    import java.io.StringReader;

    import org.antlr.runtime.ANTLRReaderStream;
    import org.antlr.runtime.CommonTokenStream;
    import org.antlr.runtime.MismatchedSetException;
    import org.antlr.runtime.MismatchedTokenException;
    import org.antlr.runtime.RecognitionException;

    public class Main 
    {
        public static void main(String[] args) throws IOException
        {
            String content = "isprogrammer or ";
            ANTLRReaderStream input = new ANTLRReaderStream(new StringReader(content)); 
            CalculatorLexer lexer = new CalculatorLexer(input); 
            CommonTokenStream tokens = new CommonTokenStream(lexer); 
            CalculatorParser parser = new CalculatorParser(tokens); 

            // 可能的后续匹配标记名
            String[] expecting = null;

            try 
            { 
                CommonTree tree = parser.expr().getTree();
                // 处理tree
                ......
            }
            catch (RecognitionException e) 
            {
                expecting = expecting(parser, e);
            }         

        }

        public static ExpectInfo expecting(MyAntlrParser parser, RecognitionException e)
        {				
            if (e instanceof MismatchedTokenException) 
            { 
                String expecting =  parser.getTokenNames()[parser.ttype];
                return new String[]{expecting);
            } else if (e instanceof MismatchedSetException ||
                        e instanceof NoViableAltException) 
            { 
                String[] expecting = CollectionUtil.getElements(
                        parser.getTokenNames(), 
                        parser.follow.toArray());
                return toExpectInfo(e, expecting);
            } else
            {
                return getRule(e), null;
            }
        }

        private static String getRule(Exception e) 
        {
            return e.getStackTrace()[0].getMethodName();
        }

    }

　　语法文件Calculator.g如下：

　　　　grammar Calculator;
　　　　options
　　　　{
　　　　　　backtrack=true;
　　　　　　memoize=true;
　　　　　　k = 10;
　　　　　　output=AST;
　　　　　　ASTLabelType=CommonTree;
　　　　}

　　　　expr : basic_expr | logic_expr ;
　　　　basic_expr : operator WS? PLUS WS? operator ;
　　　　logic_expr : operator ( WS OR WS operator ) + ;
　　　　operator : (VAR | INT) ;

　　　　PLUS : '+' ;
　　　　OR : 'or' ;
　　　　INT : ('0'..'9')+ ;
　　　　VAR : 'a'..'z' ('a'..'z'|'0'..'9'|'_'|'$'|'#')* ;
　　　　WS : ( ' ' | '\t' | '\r' | '\n' )+ ;

　　进行测试，发现expecting返回的基本上是我所希望的，第一步算成功了。但还有两个
问题。第一个，抛出异常时得不到tree。第二个，即使语法正确不抛异常得到了tree，但
这个tree是扁平的，只能得到PLUS, OR, INT, VAR, WS 这些标记名，不能知道是basic_expr
还是logic_expr。
　　那么能不能再次祭起重载利器解决这个难题呢？再次分析CalculatorParser.java代码，发现它都是通过一个CommonTreeAdaptor对象来生成树、添加子树等操作，如果我把自己的Adaptor强制赋给它不就有希望在自己的Adaptor类中生成自己定义的语法树了吗？
　　另外通过其它的例子发现可以在定义语法规则时定义对应的代码片断，Antlr3会把它插入到该规则对应的方法中（方法名和规则名相同），这样就有可能让规则方法调用我的代码向自己生成的语法树插入规则名。再次动手！
　　首先写一个树结构类 NovaAntlrTree 保存我所希望的标记信息：

    import java.util.ArrayList;
    import java.util.List;

    import org.antlr.runtime.tree.Tree;

    public class NovaAntlrTree 
    {
        private NovaAntlrTree parent = null;

        private String rule = null;
        private String text = null;
        private int type;
        private int line;
        private int positionInLine;

        private List children = null;

        public NovaAntlrTree(Tree node, String rule)
        {
            this.rule = rule;

            text = node.getText();
            type = node.getType();
            line = node.getLine();
            positionInLine = node.getCharPositionInLine();
        }

        public NovaAntlrTree getChild(int i) 
        {
            if ( children==null || i>=children.size() ) 
            {
                return null;
            }
            return (NovaAntlrTree)children.get(i);
        }

        public int getChildCount()
        {
            if ( children==null ) 
            {
                return 0;
            }
            return children.size();
        }

        public void addChild(NovaAntlrTree t)
        {
            if ( t==null ) 
            {
                return; 
            }

            t.setParent(this);

            if ( children==null ) 
            {
                children = new ArrayList();
            }
            children.add(t);
        }

        public String getFinalRule(NovaAntlrParser parser) 
        {
            String realRule = rule;
            if (realRule == null && type > 0)
                realRule = parser.getTokenNames()[type];

            return realRule;
        }

        public NovaAntlrTree getStartTree()
        {
            if (line > 0)
                return this;

            return getChild(0).getStartTree();
        }

        public NovaAntlrTree getEndTree()
        {
            if (line > 0)
                return this;

            return getChild(getChildCount()-1).getEndTree();
        }

        public int getLine() {
            return line;
        }

        public int getPositionInLine() {
            return positionInLine;
        }

        public String getRule() {
            return rule;
        }

        public void setRule(String rule)
        {
            this.rule = rule;
        }

        public String getText() {
            return text;
        }

        public int getType() {
            return type;
        }

        public NovaAntlrTree getParent() {
            return parent;
        }

        public void setParent(NovaAntlrTree parent) {
            this.parent = parent;
        }

    }

　　语法文件 Calculator.g略作修改：

　　......
　　basic_expr : ( operator WS? PLUS WS? operator ) { setRule("basic_expr"); } ;
　　logic_expr : ( operator ( WS OR WS operator ) + ) { setRule("logic_expr"); } ;
　　operator : (VAR | INT) { setRule("operator"); };
　　......

　　现在来写自己的Adator类，它继承 CommonTreeAdaptor

    import java.util.HashSet;

    import org.antlr.runtime.Token;
    import org.antlr.runtime.tree.CommonTree;
    import org.antlr.runtime.tree.CommonTreeAdaptor;
    import org.antlr.runtime.tree.Tree;

    public class NovaTreeAdaptor extends CommonTreeAdaptor
    {
        private NovaAntlrTree root = null;
        private NovaAntlrTree current = null;

        private HashSet addedTokenIndices = new HashSet();

        public NovaTreeAdaptor()
        {
            super();
        }

        public void addChild(Object t, Object child)
        {
            super.addChild(t, child);

            CommonTree ct = (CommonTree) child;
            Token token = ct.getToken();
            if (token != null)
            {
                Integer index = new Integer(token.getTokenIndex());
                if (!addedTokenIndices.contains(index))
                {
                    current.addChild(createTree(child));
                    addedTokenIndices.add(index);
                }
            }
        }

        public Object nil() 
        {
            Object o = super.nil();

            NovaAntlrTree t = createTree(o);
            if (current != null)
                current.addChild(t);
            current = t;
            if (root == null)
                root = current;
            //System.out.println("rule:"+parser.getRule()+", nil object "+o);

            return o;
        }

        private NovaAntlrTree createTree(Object o) {
            NovaAntlrTree t = new NovaAntlrTree((Tree) o, null);
            return t;
        }

        public NovaAntlrTree getTree() {
            return root;
        }

        public void setRule(String rule)
        {
            if (current != null)
            {
                current.setRule(rule);
                current = current.getParent();
            }

        }

        public Object rulePostProcessing(Object root) 
        {
            return super.rulePostProcessing(root);
        }
    }

　　下面我们来增强MyAntlrParser类，先重载三个异常处理方法，目的一是减少修改CalculatorParser.java
的工作量，毕竟语法文件可能经常会作修改，每次重新生成CalculatorParser.java文件后对它进行比较多的替换
动作总是烦人的，二是希望它能直接抛出异常，三是不想看到它烦人的异常信息输出：

	public void recover(IntStream input, RecognitionException re)
	{
	}

	public void reportError(RecognitionException e) 
	{		
	}

	public void recoverFromMismatchedToken(IntStream input,
			   RecognitionException e,
			   int ttype,
			   BitSet follow)
	throws RecognitionException
	{
		if ( input.LA(2)==ttype ) 
		{
			beginResync();
			input.consume(); 
			endResync();
			input.consume(); 
			return;
		}
		if ( !recoverFromMismatchedElement(input,e,follow) ) 
		{
			throw e;
		}
	}

　　然后重载两个和Adaptor相关的方法：

	public TreeAdaptor getTreeAdaptor()
	{
		return null;
	}
	public void setTreeAdaptor(TreeAdaptor adaptor)
	{
    }

	public NovaTreeAdaptor getNovaTreeAdaptor()
	{
		return (NovaTreeAdaptor) getTreeAdaptor();
	}

然后实现在语法文件中增加的代码中的setRule方法：

	public void setRule(String rule)
    {
		getNovaTreeAdaptor().setRule(rule);
	}

　　最后通过命令行 java org.antlr.Tool Calculator.g重新生成Calculator.java文件，并作以下两个替换动作（以后每次修改语法后都要做这个动作即可）：
　　把父类Parser替换成MyAntlrParser
　　把所有的reportError(RecognitionException e); 这一行替换为throw e;

　　好了，来编写新的测试代码测试我们自己生成的语法树有没有达到预期效果：

    public static void main(String[] args) throws IOException
    {
        .....

        try 
        { 
            parser.setTreeAdaptor(new NovaTreeAdaptor(parser));
        	parser.expr();
        }
        catch (RecognitionException e) 
        {  
        } 
        iterate(parser.getNovaTreeAdaptor().getTree(), 0);
    }

	private static void iterate(NovaAntlrTree t, int depth)
	{
		if (t == null)
			return;

		System.out.print(createSpace(depth*4));
		System.out.println("rule:"+t.getRule()+"["+t.getText()+"]");
		depth++;

		for (int i=0; i<t.getChildCount(); i++)
		{
			iterate(t.getChild(i), depth);
		}

	}

    public static String createSpace(int count)
    {
    	StringBuffer sb = new StringBuffer(count);
    	for (int i=0; i<count; i++)
    		sb.append(' ');
    	return sb.toString();
    }

　　运行结果如下：

　　rule:null[null]
　　　　rule:logic_expr[null]
　　　　　　rule:operator[null]
　　　　　　　　rule:null[isprogrammer]
　　　　　　rule:null[ ]
　　　　　　rule:null[or]
　　　　　　rule:null[ ]
　　　　　　rule:operator[null]
　　　　　　　　rule:null[3]
　　　　　　rule:null[ ]
　　　　　　rule:null[or]
　　　　　　rule:null[ ]
　　　　　　rule:operator[null]
　　　　　　　　rule:null[isantlr]

　　测试成功后把类移植到我的SQL编辑器中，增加显示不同Style功能后，大功告成，这是效果图：
[img]http://gb-lu.iteye.com/upload/picture/pic/7648/7b355b40-4697-3068-8fcc-7a30a845a8ec.jpg[/img]