正则表达式 replace 过程解析

最新推荐文章于 2024-05-06 15:16:12 发布

原创最新推荐文章于 2024-05-06 15:16:12 发布 · 601 阅读

2 ·

CC 4.0 BY-SA版权

正则表达式专栏收录该内容

3 篇文章

订阅专栏

replace 过程解析

思考下面的结果：

console.log("more".replace(/.*/g, "p")); // pp
console.log("more".replace(/.*?/g, "p")); // pmpoprpep

上面分别使用贪婪模式和懒惰模式尝试替换，都出现了意外的结果：

贪婪模式，多出一个p
懒惰模式，没有替换任何字符

前置须知：

.可以匹配任意字符，包括空字符。
正则匹配时，人为字符串首尾及每个字符间隙都存在一个空字符用于匹配。
正则对象中有一个很重要的属性 lastIndex，表示下次匹配的开始。
1. 一般在使用正则对象的方法时使用，字符串的replace方法中也用到了它。

replace 执行的过程：

当正则配置了global修饰符时，replace会从头查找字符串。
1. 直到匹配结果为 null 时结束（正则匹配不到结果时返回 null ）
开始之前会先创建一个空的 results 数组，用于存放匹配的结果。
过程中，每匹配到一个结果，就会向results数组中添加结果
同时 lastIndex 更新为结果的下一个字符位置，再继续匹配
1. 如果匹配的结果是空字符（字符间隙），则位置+1
2. 这是神奇的地方，如果匹配到字符间隙，就会跳过本应匹配的字符
匹配结束后会替换匹配的结果

整个过程类似：

String.prototype.myReplace = function (reg, rpStr) {
  let str = this.valueOf()
  let global = reg.global
  let results = []
  let done = false
  let history = {}, step = 0 // 流水
  while (!done) {
    // 从上次匹配结果位置开始
    // 这里记录本次匹配的字符串范围，用于打印
    let newStr = str.slice(reg.lastIndex)
    // exec会更新lastIndex
    let result = reg.exec(str)
    if (result === null) {
      done = true // 匹配结束
    } else {
      results.push(result)
      if (!global) {
        done = true // 非全局匹配，匹配一次结束
      } else {
        if (!result[0]) {
          // 如果匹配到的是''，字符间隙，nextIndex额外+1
          reg.lastIndex = reg.lastIndex + 1
        }
      }
    }
    history[`step${++step}:`] = {
      str: newStr,
      result: result ? result[0] : result,
      results: results.map(r => r[0]),
      lastIndex: reg.lastIndex,
      done: done
    }
  }
  
  console.table(history)

  // 替换results的结果，并返回结果
  // ...
}

('more').replace(/.*/g, 'p') 匹配记录：

(index)	str	result	results	lastIndex	done
step1:	‘more’	‘more’	[ ‘more’ ]	4	false
step2:	‘’	‘’	[ ‘more’, ‘’ ]	5	false
step3:	‘’	null	[ ‘more’, ‘’ ]	0	true

所以替换 [ ‘more’, ‘’ ] 后，结果为 pp

('more').replace(/.*?/g, 'p') 匹配记录：

(index)	str	result	results	lastIndex	done
step1:	‘more’	‘’	[ ‘’ ]	1	false
step2:	‘ore’	‘’	[ ‘’, ‘’ ]	2	false
step3:	‘re’	‘’	[ ‘’, ‘’, ‘’ ]	3	false
step4:	‘e’	‘’	[ ‘’, ‘’, ‘’, ‘’ ]	4	false
step5:	‘’	‘’	[ ‘’, ‘’, ‘’, ‘’, ‘’ ]	5	false
step6:	‘’	null	[ ‘’, ‘’, ‘’, ‘’, ‘’ ]	0	true