Lookahead From Wikipedia, the free encyclopedia

本文探讨了人工智能中前瞻(lookahead)的概念及其在组合搜索与解析器中的应用。在组合搜索中,前瞻指定了问题图被探索的深度,并通过设置特定限制来有效控制算法的时间消耗。在解析器中,前瞻确定了解析器可以查看的最大输入标记数,以决定使用哪种规则。文章还详细解释了前瞻如何帮助解析器在冲突情况下做出正确选择,减少重复状态并减轻额外堆栈的负担。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Lookahead in search problems

In artificial intelligencelookahead is an important component of combinatorial search which specifies, roughly, how deeply the graph representing the problem is explored. The need for a specific limit on lookahead comes from the large problem graphs in many applications, such as computer chess andcomputer Go. A naive breadth-first search of these graphs would quickly consume all the memory of any modern computer. By setting a specific lookahead limit, the algorithm's time can be carefully controlled; its time increases exponentially as the lookahead limit increases.

More sophisticated search techniques such as alpha-beta pruning are able to eliminate entire subtrees of the search tree from consideration. When these techniques are used, lookahead is not a precisely defined quantity, but instead either the maximum depth searched or some type of average.

[edit]Lookahead in parsing

Lookahead is also an important concept in parsers in compilers which establishes the maximum number of incoming input tokens the parser can look at to decide which rule it should use.

Lookahead is especially relevant to LLLR, and LALR parsers, where it is often explicitly indicated by affixing the lookahead to the algorithm name in parentheses, such as LALR(1).

Most programming languages, the primary target of parsers, are carefully defined in such a way that a parser with limited lookahead, typically one, can parse them, because parsers with limited lookahead are often more efficient. One important change[citation needed] to this trend came in 1990 when Terence Parr created ANTLR for his Ph.D. thesis, a parser generator for efficient LL(k) parsers, where k is any fixed value.

Parsers typically have only a few actions after seeing each token. They are shift (add this token to the stack for later reduction), reduce (pop tokens from the stack and form a syntactic construct), end, error (no known rule applies) or conflict (does not know whether to shift or reduce).

Lookahead has two advantages.

  • It helps the parser take the correct action in case of conflicts. For example, parsing the if statement in the case of an else clause.
  • It eliminates many duplicate states and eases the burden of an extra stack. A C language non-lookahead parser will have around 10,000 states. A lookahead parser will have around 300 states.

Example: Parsing the Expression 1 + 2 * 3

 Set of expression parsing rules (called grammar) is as follows, 
Rule1: E → E + E Expression is the sum of two expressions. Rule2: E → E * E Expression is the product of two expressions. Rule3: E → number Expression is a simple number Rule4: + has less precedence than *

Most programming languages (except for a few such as APL and Smalltalk) and algebraic formulas give higher precedence to multiplication than addition, in which case the correct interpretation of the example above is (1 + (2*3)). Note that Rule4 above is a semantic rule. It is possible to rewrite the grammar to incorporate this into the syntax. However, not all such rules can be translated into syntax.

Simple non-lookahead parser actions

  1. Reduces 1 to expression E on input 1 based on rule3.
  2. Shift + onto stack on input 1 in anticipation of rule1.
  3. Reduce stack element 2 to Expression E based on rule3.
  4. Reduce stack items E+ and new input E to E based on rule1.
  5. Shift * onto stack on input * in anticipation of rule2.
  6. Shift 3 onto stack on input 3 in anticipation of rule3.
  7. Reduce 3 to Expression E on input 3 based on rule3.
  8. Reduce stack items E* and new input E to E based on rule2.

The parse tree and resulting code from it is not correct according to language semantics.

To correctly parse without lookahead, there are three solutions:

  • The user has to enclose expressions within parentheses. This often is not a viable solution.
  • The parser needs to have more logic to backtrack and retry whenever a rule is violated or not complete. The similar method is followed in LL parsers.
  • Alternatively, the parser or grammar needs to have extra logic to delay reduction and reduce only when it is absolutely sure which rule to reduce first. This method is used in LR parsers. This correctly parses the expression but with many more states and increased stack depth.

Lookahead parser actions

  1. Shift 1 onto stack on input 1 in anticipation of rule3. It does not reduce immediately.
  2. Reduce stack item 1 to simple Expression on input + based on rule3. The lookahead is +, so we are on path to E +, so we can reduce the stack to E.
  3. Shift + onto stack on input + in anticipation of rule1.
  4. Shift 2 onto stack on input 2 in anticipation of rule3.
  5. Reduce stack item 2 to Expression on input * based on rule3. The lookahead * expects only E before it.
  6. Now stack has E + E and still the input is *. It has two choices now, either to shift based on rule2 or reduction based on rule1. Since * has more precedence than + based on rule4, so shift * onto stack in anticipation of rule2.
  7. Shift 3 onto stack on input 3 in anticipation of rule3.
  8. Reduce stack item 3 to Expression after seeing end of input based on rule3.
  9. Reduce stack items E * E to E based on rule2.
  10. Reduce stack items E + E to E based on rule1.

The parse tree generated is correct and simply more efficient than non-lookahead parsers. This is the strategy followed in LALR parsers.

[edit]Lookahead vs. Lazy evaluation

This is in contrast to another technique called lazy evaluation that delays the computation until it is really needed. Both techniques are used for economical usage of space or time. Lookahead makes the right decision and so avoids backtracking from undesirable stages at later stages of algorithm and so saves space, at the cost of a slight increase of time due to the overhead of extra lookups. Lazy evaluation normally skips the unexplored algorithmic paths and thus saves both the time and space in general. Some applications of lazy evaluations are demand paging in operating systems and lazy parse tables in compilers.

In search space exploration, both the techniques are used. When there are already promising paths in the algorithm to evaluate, lazy evaluation is used and to be explored paths will be saved in the queue or stack. When there are no promising paths to evaluate and to check whether the new path can be a more promising path in leading to solution, lookahead is used.

Compilers also use both the techniques. They will be lazy in generating parse tables from given rules, but they lookahead in parsing given input.

[edit]

### Lookahead 的概念 Lookahead 是正则表达式中的一个重要特性,用于指定某些条件而不实际消耗字符匹配。它允许开发者定义一个模式,在该模式下仅当某个子表达式的匹配成功时才继续后续的匹配过程。 #### 正向肯定 lookahead 正向肯定 lookahead 表达式的形式为 `(?=...)`,表示当前位置之后的内容必须满足括号内的模式才能通过验证[^1]。例如: ```regex \b\d+(?= dollars)\b ``` 上述例子会匹配任何以空格分隔的数字序列,前提是这些数字后面紧跟着字符串 `" dollars"`。注意的是,`" dollars"` 并不会被计入最终的结果中。 #### 反向否定 lookahead 反向否定 lookahead 使用 `(?!...)` 来实现相反的效果——即如果当前位置后的文本不符合给定的模式,则此部分可以匹配成功[^2]。比如下面这个例子用来查找不是由字母组成的单词边界词元: ```regex \w+\b(?!abc) ``` 这里的意思是从左至右扫描每一个可能成为独立单元(word boundary)的地方;只要那个地方不紧接着出现 abc 就算作符合条件的一个整体 word unit。 #### 实际应用案例分析 考虑这样一个需求场景:我们需要提取日期格式但排除掉那些带有特定前缀或者后缀的情况。假设我们的目标数据集中存在多种不同样式的日期记录形式如 dd.mm.yy 和 yyyy-mm-dd ,但是我们希望忽略所有形似 'invalid-date-' 开头或者是 '-not-valid' 结束的部分 。那么我们可以构建如下所示复杂一点却非常实用有效的 regex : ```regex (?<!invalid\-date\-)(\d{2}\.\d{2}|\d{4})-(?!\d*not\-valid$) ``` 在这个复杂的正则里包含了两个方向上的 lookaround 构造来分别处理前置与后置约束条件[^3]。 #### 替换操作中的使用 除了单纯的匹配之外,在涉及到替换的时候也可以巧妙运用lookaheads达到更加精准控制的目的。例如将标准美国电话号码转换成国际拨打版本时可以通过简单的调整完成任务: ```python import re pattern = r"(?<=[\(]?(\d{3}))[-.) ]*(\d{3})[-.]?(\d{4})" text="Call me at (800) 555-1212." result=re.sub(pattern,"+1 (\g<1>) \g<2>-\g<3>", text) print(result) ``` 上面这段脚本利用了Python内置库re模块的功能实现了从原始美式写法到国际化展示样式之间的无缝过渡变换效果[^4]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值