text/template 代码阅读: parse

最新推荐文章于 2024-09-11 08:45:56 发布

big_cheng

最新推荐文章于 2024-09-11 08:45:56 发布

阅读量292

点赞数

CC 4.0 BY-SA版权

文章标签： golang

本文链接：https://blog.youkuaiyun.com/big_cheng/article/details/126039101

版权

本文为原创, 遵循 CC 4.0 BY-SA 版权协议, 转载需注明出处: https://blog.youkuaiyun.com/big_cheng/article/details/126039101.

lex.go

lexer 负责分词(拆分token).

流程

func (l *lexer) run() {
	for state := lexText; state != nil; {
		state = state(l)
	}
	......
}

给定一段模板源码, 例如:

para1
{{if .a}} para2 {{end}}

lexer 首先进入plain text 解析:

func lexText(l *lexer) stateFn {
	l.width = 0
	if x := strings.Index(l.input[l.pos:], l.leftDelim); x >= 0 {
		......
		if l.pos > l.start {
			......
			l.emit(itemText)
		}
		......
		return lexLeftDelim
	}
	......
}

从当前pos(初值0, 开头) 开始找模板action 的开始分隔符(“{{”). 如果找到, 先将"{{" 前的itemText 发给分词接收者(一般就是parser), 然后返回lexLeftDelim - 该函数进入action 开始的处理.

lexLeftDelim()

func lexLeftDelim(l *lexer) stateFn {
	......
	if strings.HasPrefix(l.input[l.pos+afterMarker:], leftComment) {
		......
		return lexComment
	}
	l.emit(itemLeftDelim)
	......
	return lexInsideAction
}

action 可能是comment ( {{/* xxx */}} ), 返回lexComment 去处理.
否则是pipeline/control action ( {{.a}}, {{if .a}} … {{end}}, {{print 1}} 等 ), 先发送"{{", 再返回lexInsideAction.

lexInsideAction()

func lexInsideAction(l *lexer) stateFn {
	......
	switch r := l.next(); {
	......
	case r == '"':
		return lexQuote
	......
	case r == '\'':
		return lexChar
	......
	}
	return lexInsideAction
}

模板action 有规定的写法. 例如遇到双引号(“) 代表开始一个字符串常量( {{print “xxx”}} ). 不可能遇到字符常量里的双引号( {{print '”'}} ), 因为先遇到字符常量分隔符(单引号, ') 时就已经转到lexChar 处理.

如上, lexer 根据特定分隔符判断要处理何种item, 同时发送中间结果item. 如此直到返回nil stateFn.

space 与trimMarker

见包文档, 模板支持action 的前、后trim. 如:

"{{23 -}} < {{- 45}}"

“23” 后的" -}}" 指示去掉action 后的空格. “45” 前的"{{- " 指示去掉action 前的空格. 输出结果将是:

"23<45"

action 里(首次) 遇到空格时:

func lexInsideAction(l *lexer) stateFn {
	......
	switch r := l.next(); {
	......
	case isSpace(r):
		l.backup()
		return lexSpace
	......
	}
	......
}

backup() 退回到空格之前(每next() 一次只能backup() 一次), 转到lexSpace 处理.

func lexSpace(l *lexer) stateFn {
	var r rune
	var numSpaces int
	for {
		r = l.peek()
		if !isSpace(r) {
			break
		}
		l.next()
		numSpaces++
	}
	if hasRightTrimMarker(l.input[l.pos-1:]) && strings.HasPrefix(l.input[l.pos-1+trimMarkerLen:], l.rightDelim) {
		l.backup()
		if numSpaces == 1 {
			return lexRightDelim
		}
	}
	l.emit(itemSpace)
	return lexInsideAction
}

peek() 不消费next rune. 由于仅lexInsideAction() 会调用lexSpace(), 所以numSpaces >= 1.
退出for 循环时pos 在空格之后第一个非空格字符. 如果是单个空格 + “-}}”, 退回空格 - lexRightDelim() 会处理trim 及"}}“. 如果是多个空格 + “-}}”, 先退回一个空格, 将前面的一到多个空格合并发送为一个itemSpace, 之后再转到lexInsideAction() 处理时就变成单个空格 + “-}}” 的情况. 如果不是在”}}" 边界处, 则合并发送一个itemSpace.

identifier

func lexInsideAction(l *lexer) stateFn {
	......
	switch r := l.next(); {
	......
	case r == '+' || r == '-' || ('0' <= r && r <= '9'):
		l.backup()
		return lexNumber
	case isAlphaNumeric(r):
		l.backup()
		return lexIdentifier

action 里如果以0-9开头则转到lexNumber 处理, 所以下一个case isAlphaNumeric 实际仅以字母开头. 回退刚读到的字符, 转到lexIdentifier.

func lexIdentifier(l *lexer) stateFn {
Loop:
	for {
		switch r := l.next(); {
		case isAlphaNumeric(r):
			// absorb.
		default:
			l.backup()
			......
			if !l.atTerminator() {
				return l.errorf("bad character %#U", r)
			}

首次l.next() 读到刚才回退的(首)字符, 后续的字母数字一直读取, 直到不是字母数字(回退它).

func (l *lexer) atTerminator() bool {
	r := l.peek()
	if isSpace(r) {
		return true
	}
	switch r {
	case eof, '.', ',', '|', ':', ')', '(':
		return true
	}
	if rd, _ := utf8.DecodeRuneInString(l.rightDelim); rd == r {
		return true
	}
	return false
}

identifier 只能后跟空格、eof/dot/逗号/竖线/冒号/左右括号, 或者action 结束分隔符的首字符, 其他的非字母数字均不允许.
atTerminator() 也被lexField、lexVariable 调用, 条件放宽了. 例如:

{{print(1)}}

分词是identifier “print” 和左括号"(“, 但parser 处理时将报错(在command(). operand 即"print” 只能后跟空格、}}、右括号、竖线).
另, identifier 后跟action 结束分隔符的首字符也是允许的, 例如:

{{$x:=1}}{{$x}2}}

同样, 也将在后面parser 处理时报错( unexpected “}” in operand ).

pipeline, command, operand, term

语法上, 模板里"{{ }}" 包围的是action, action 之外是plain text(html/css/js 等).
action (除了comment action 之外) 分2种:

control action 如: {{if pipeline}}xx{{end}}、{{range pipeline}}xx{{else}}yy{{end}}
{{pipeline}}

pipeline 是一个或用管道符号(竖线, |) 分隔的多个command. 单个command 的例子:

单个常量/变量/字段/key 等: 1、“abc”、nil、.、$x、.Field1、.k1
串联的一组字段/key: .Field1.k1.Field2.k2
当前dot 的一个方法调用: .Method1. 方法调用可以串联如: $x.Method1.Field1.k1.Method2
单个函数调用(函数分全局和模板自定义 2层) 如: func1
函数和串联末尾的方法可以带参数(空格分隔) 如: $x.Method1 “arg1” “arg2”、print 1 2 3
以上均可用括号包围, 视为单个. 例如print 的参数是函数调用的结果: print (print “abc”). 例如方法调用的结果再串联: (.Method1 “abc”).Field1

command 是单个或空格分隔的多个operand.
operand 是一个term, 后面可以串联一到多个.FieldXx.
term 是单个常量/变量/函数/字段等, 如1、nil、true、print、.Field1、$.
term 也可以是括号包围的一个pipeline.

parse.go

parser 持续读取lexer 拆出的分词(item, 即token), 构建node.go 里定义的各节点(XxNode). 例如:

{{if .a}} {{.b}} {{else}} {{.c}}{{.d}} {{end}}

会构建出一个IfNode:

type IfNode struct {
	BranchNode
}

type BranchNode struct {
	NodeType
	Pos
	tr       *Tree
	Line     int
	Pipe     *PipeNode
	List     *ListNode
	ElseList *ListNode
}

Pipe 对应pipeline “.a”. List 含单个ActionNode, 对应".b". ElseList 含2个ActionNode, 分别对应".c" 和".d".

t.parse()

parse() 是解析模板内容的入口方法, 除了解析内容里的plain text 和action 之外, 也解析(关联)模板定义如"{{define “tpl-1”}}xxx{{end}}" (关联模板只能定义在内容的顶层):

func (t *Tree) parse() {
	t.Root = t.newList(t.peek().pos)
	for t.peek().typ != itemEOF {
		if t.peek().typ == itemLeftDelim {
			delim := t.next()
			if t.nextNonSpace().typ == itemDefine {
				newT := New("definition")
				newT.text = t.text
				newT.Mode = t.Mode
				newT.ParseName = t.ParseName
				newT.startParse(t.funcs, t.lex, t.treeSet)
				newT.parseDefinition()
				continue
			}
			t.backup2(delim)
		}
		switch n := t.textOrAction(); n.Type() {
		case nodeEnd, nodeElse:
			t.errorf("unexpected %s", n)
		default:
			t.Root.append(n)
		}
	}
}

持续解析, 如果下一token 是"{{" 且再下一个非空格token 是"define" (实际是identifier, 但lexer 专门定义为类型itemDefine), 则定义一个新的parser - New(“definition”) - 去解析它.
由于newT.startParse() 参数"t.lex", 实际是接着当前分词的位置 - 即"define" 之后 - 解析.

func (t *Tree) parseDefinition() {
	const context = "define clause"
	name := t.expectOneOf(itemString, itemRawString, context)
	var err error
	t.Name, err = strconv.Unquote(name.val)
	if err != nil {
		t.error(err)
	}
	t.expect(itemRightDelim, context)
	
	var end Node
	t.Root, end = t.itemList()
	if end.Type() != nodeEnd {
		t.errorf("unexpected %s in %s", end, context)
	}
	t.add()
	t.stopParse()
}

当前词法位置(pos) 在"define" 之后, t.expectOneOf() 限制后面只能是itemString 或itemRawString - 即关联模板名如 {{define “tpl-1”.
strconv.Unquote() 解析得到关联模板名t.Name - 此实际名称会代替上面parse() 里New(“definition”) 里的临时名.
留意parser 通常忽略action 内部做分隔用的itemSpace, 例如"define" 和"tpl-1" 之间的(一到多个)空格仅被lexer 用于词法分隔, parser 并不关心.

t.itemList() 解析关联模板的定义(含结尾"{{end}}").

func (t *Tree) add() {
	tree := t.treeSet[t.Name]
	if tree == nil || IsEmptyTree(tree.Root) {
		t.treeSet[t.Name] = t
		return
	}
	if !IsEmptyTree(t.Root) {
		t.errorf("template: multiple definition of template %q", t.Name)
	}
}

t.treeSet 包含t 的所有关联模板(名称=>*Tree), 也包括自己. 例如一个模板对象名称为"foo", 定义内容是"xxx {{define “bar”}}yy{{end}}“, 则该对象的treeSet 包含"foo” 和"bar" 2项.
t.add() 会检查不重名. 但是已有的重名模板如果没有实际内容, 则可以被覆盖.

完成t.parseDefinition() 回到t.parse(): 后面是continue 即检查{{define “xx”}}…{{end}} 后面的token(可以是空格 - itemText).
如果消费了"{{" 再后面不是"define", 此时parser 内部分词缓存(Tree.token [3]item) 第0项是该不是"define" 的token. t.backup2(delim) 后缓存内容是: 非"define"、{{ 2项。缓存是后进先出, 所以下次t.next() 取到"{{“, 再下次t.next() 取到非"define”. 即目前消费了2个token 但是发现用不上, 还回去.

t.parse() 继续: 如果下一个token 不是"{{“, 或者下2个连续token 不是”{{define", 则调用t.textOrAction() 处理 - 可能是plain text 或action.

t.textOrAction()

func (t *Tree) textOrAction() Node {
	switch token := t.nextNonSpace(); token.typ {

留意lexer itemSpace 仅指action 之内(分隔)的空格. action 之外的所有文本(可包含空格) 才是plain text (itemText).
t.textOrAction() 处理plain text、comment action、其他action.

elseif

if action 可有任意多个elseif:

{{if .a}} 1 {{else if .b}} 2 {{else if .c}} 3 {{else}} 4 {{end}}

如前, IfNode:

type IfNode struct {
	BranchNode
}

type BranchNode struct {
	......
	List     *ListNode
	ElseList *ListNode
}

解析:

func (t *Tree) parseControl(allowElseIf bool, context string) (pos Pos, line int, pipe *PipeNode, list, elseList *ListNode) {
	......
	pipe = t.pipeline(context, itemRightDelim)
	......
	list, next = t.itemList()
	......
	switch next.Type() {
	case nodeEnd:
	case nodeElse:
		if allowElseIf {
			if t.peek().typ == itemIf {
				t.next()
				elseList = t.newList(next.Position())
				elseList.append(t.ifControl())
				break
			}
		}
		elseList, next = t.itemList()
		if next.Type() != nodeEnd {
			t.errorf("expected end; found %s", next)
		}
	}
	return pipe.Position(), pipe.Line, pipe, list, elseList
}

进入t.parseControl() 时"if" 已消费. 先处理pipeline, 再处理{{if}} 后面的itemList - 消费到{{else}} 或{{end}}.
如果到{{else}} 且allowElseIf 且下一个token 是"if", 则消费if 后再次调用t.ifControl() 即t.parseControl(). 如此每一次处理else if 都递归, 直到最后的{{else if}} {{else}} {{end}} 处理返回一个IfNode:

{{if .a}} 1 {{else if .b}} 2 {{else if }} + IfNode

它作为对应{{ if .b}} 的IfNode 的elseList 部分, 如此递归逐一返回. 即最终返回的每个(嵌套)IfNode 的ElseList 都只包含单一IfNode:

IfNode(.a)
    list(1)
    elseList
        IfNode(.b)
            list(2)
            elseList
                IfNode(.c)
                    list(3)
                    elseList(4)

else 与if 之间的空格

上面仔细看有一个疑问: 如果itemList 以"else" 结束且allowElseIf 且下一个token 是"if", 但实际上else 的下一个token 是空格而非if?
经实际调试, 发现关键在else 的处理. 对于:

{{if .a}} 1 {{else if

t.parseControl() 里t.itemList() 会消费"1 {{else":

func (t *Tree) itemList() (list *ListNode, next Node) {
	list = t.newList(t.peekNonSpace().pos)
	for t.peekNonSpace().typ != itemEOF {
		n := t.textOrAction()

" 1 " 整个作为itemText 被t.textOrAction() 处理:

func (t *Tree) textOrAction() Node {
	switch token := t.nextNonSpace(); token.typ {
	case itemText: // lexer itemText: " 1 "
		return t.newText(token.pos, token.val)
	case itemLeftDelim:
		t.actionLine = token.line
		defer t.clearActionLine()
		return t.action()

t.action() 处理"{{else " (“{{” 已消费) :

func (t *Tree) action() (n Node) {
	switch token := t.nextNonSpace(); token.typ {
	......
	case itemElse:
		return t.elseControl()

“else” 已消费:

func (t *Tree) elseControl() Node { // {{if .a}} 1 {{else if
	peek := t.peekNonSpace() // "if"
	if peek.typ == itemIf {
		return t.newElse(peek.pos, peek.line)
	}
	token := t.expect(itemRightDelim, "else") // 如果不是{{else if", 则必须是{{else}}
	return t.newElse(token.pos, token.line)
}

func (t *Tree) peekNonSpace() item {
	token := t.nextNonSpace()
	t.backup()
	return token
}

t.peekNonSpace() 是一直消费itemSpace 直到itemIf, 然后t.backup() 退还itemIf: 即"if" 前的空格实际已被t.elseControl() 消费, 所以"else " 之后peek 到的是itemIf.
另留意, ElseNode 的pos 实际使用其后面的"if" 或"}}" 的pos.

t.pipeline() 里的backup2/3

func (t *Tree) pipeline(context string, end itemType) (pipe *PipeNode) {
	token := t.peekNonSpace()
	pipe = t.newPipeline(token.pos, token.line, nil)
decls:
	if v := t.peekNonSpace(); v.typ == itemVariable {
		t.next()
		// 3-token look-ahead
		tokenAfterVariable := t.peek()
		next := t.peekNonSpace()
		switch {
		case next.typ == itemAssign, next.typ == itemDeclare:
			......
		case next.typ == itemChar && next.val == ",":
			......
		case tokenAfterVariable.typ == itemSpace:
			t.backup3(v, tokenAfterVariable)
		default:
			t.backup2(v)
		}
	}