Golang 正则表达式

最新推荐文章于 2025-03-17 23:35:19 发布

MeteionY

最新推荐文章于 2025-03-17 23:35:19 发布

阅读量1.3k

点赞数 9

文章标签： golang 正则表达式 c#

本文链接：https://blog.youkuaiyun.com/qq_42004448/article/details/143442349

版权

语法

.：匹配任意单个字符，除了换行符。
*：匹配前面的元素零次或多次。
+：匹配前面的元素一次或多次。
?：匹配前面的元素零次或一次。
^：匹配字符串的开头。
$：匹配字符串的结尾。
[]：字符类，匹配方括号中的任意字符。
[^]：否定字符类，匹配除方括号中字符以外的任意字符。
|：逻辑或，匹配两个模式之一。
()：捕获组，用于分组和提取匹配的子字符串。

MatchString 函数

MatchString() 函数用来匹配子字符串，下面的程序用来匹配是否是 一个字母后跟着 even 字符，正则模式为： .even。

package main

import (
    "fmt"
    "log"
    "regexp"
)

func main() {
    words := [...]string{"Seven", "even", "Maven", "Amen", "eleven"}
    for _, word := range words {
        found, err := regexp.MatchString(".even", word)
        if err != nil {
            log.Fatal(err)
        }

        if found {
            fmt.Printf("%s matches\n", word)
        } else {

            fmt.Printf("%s does not match\n", word)
        }
    }
}

运行该代码：

Seven matches
even does not match
Maven does not match
Amen does not match
eleven matches

但同时我们能看到编辑器有提示，说明直接用 MatchString 匹配字符串会影响性能。：

所以我们考虑用 Compile() 或者 MustCompile()创建一个编译好的正则表达式对象，然后再来进行模式匹配。

Compile 函数

Compile 函数解析正则表达式，如果成功，则返回可用于匹配文本的 Regexp 对象。编译的正则表达式产生更快的代码。假如正则表达式非法，那么 Compile() 方法回返回 error ,而 MustCompile() 编译非法正则表达式时不会返回 error ，而是返回 panic 。

先来看 Compile 函数：

package main

import (
    "fmt"
    "log"
    "regexp"
)

func main() {
    words := [...]string{"Seven", "even", "Maven", "Amen", "eleven"}

    re, err := regexp.Compile(".even")
    if err != nil {
        log.Fatal(err)
    }

    for _, word := range words {
        found := re.MatchString(word)
        if found {

            fmt.Printf("%s matches\n", word)
        } else {

            fmt.Printf("%s does not match\n", word)
        }
    }
}

在代码示例中，我们使用了编译的正则表达式。

re, err := regexp.Compile(".even")

即使用 Compile 编译正则表达式。然后在返回的正则表达式对象上调用 MatchString 函数：

found := re.MatchString(word)

运行程序，能看到同样的代码：

Seven matches
even does not match
Maven does not match
Amen does not match
eleven matches

MustCompile 函数

再看使用 MustCompile 函数的例子，它编译正则表达式并在无法解析表达式时发生 panic。

package main

import (
    "fmt"
    "regexp"
)

func main() {
    words := [...]string{"Seven", "even", "Maven", "Amen", "eleven"}
    re := regexp.MustCompile(".even")
    for _, word := range words {
        found := re.MatchString(word)
        if found {
            fmt.Printf("%s matches\n", word)
        } else {

            fmt.Printf("%s does not match\n", word)
        }
    }
}

运行该代码:

$ go run main.go 
Seven matches
even does not match 
Maven does not match
Amen does not match 
eleven matches

FindString 查找字符串

FindString() 用来返回第一个匹配的结果。如果没有匹配的字符串，那么它回返回一个空的字符串，当然如果你的正则表达式就是要匹配空字符串的话，它也会返回空字符串。使用 FindStringIndex 或者 FindStringSubmatch可以区分这两种情况。下面是FindString()的例子：

package main

import (
    "fmt"
    "regexp"
)

func main() {
    str := "Today is Tuesday!"
    regexp, _ := regexp.Compile("^T([a-z]+)y")
    fmt.Println(regexp.FindString(str))
}

运行该代码，可以看到如下结果：

$ go run main.go 
Today

FindAllString 函数

FindAllString 函数返回正则表达式的所有连续匹配的切片。

package main

import (
    "fmt"
    "os"
    "regexp"
)

func main() {
    var content = `Foxes are omnivorous mammals belonging to several genera 
of the family Canidae. Foxes have a flattened skull, upright triangular ears, 
a pointed, slightly upturned snout, and a long bushy tail. Foxes live on every 
continent except Antarctica. By far the most common and widespread species of 
fox is the red fox.`

    re := regexp.MustCompile("(?i)fox(es)?")
    found := re.FindAllString(content, -1)
    fmt.Printf("%q\n", found)

    if found == nil {
        fmt.Printf("no match found\n")
        os.Exit(1)
    }

    for _, word := range found {
        fmt.Printf("%s\n", word)
    }

}

在代码示例中，我们找到了单词 fox 的所有出现，包括它的复数形式。

re := regexp.MustCompile("(?i)fox(es)?")

使用 (?i) 语法，正则表达式不区分大小写。（es）？表示“es”字符可能包含零次或一次。

found := re.FindAllString(content, -1)

我们使用 FindAllString 查找所有出现的已定义正则表达式。第二个参数是要查找的最大匹配项； -1 表示搜索所有可能的匹配项。

运行结果：

["Foxes" "Foxes" "Foxes" "fox" "fox"]
Foxes
Foxes
Foxes
fox
fox

FindAllStringIndex 函数

package main

import (
    "fmt"
    "regexp"
)

func main() {
    var content = `Foxes are omnivorous mammals belonging to several genera 
of the family Canidae. Foxes have a flattened skull, upright triangular ears, 
a pointed, slightly upturned snout, and a long bushy tail. Foxes live on every 
continent except Antarctica. By far the most common and widespread species of 
fox is the red fox.`

    re := regexp.MustCompile("(?i)fox(es)?")
    idx := re.FindAllStringIndex(content, -1)

    for _, j := range idx {
        match := content[j[0]:j[1]]
        fmt.Printf("%s at %d:%d\n", match, j[0], j[1])
    }
}

在代码示例中，我们在文本中找到所有出现的 fox 单词及其索引。

Foxes at 0:5
Foxes at 81:86
Foxes at 196:201
fox at 296:299
fox at 311:314

Split 函数

Split 函数将字符串切割成由定义的正则表达式分隔的子字符串。它返回这些表达式匹配之间的子字符串切片。

package main

import (
    "fmt"
    "log"
    "regexp"
    "strconv"
)

func main() {
    var data = `22, 1, 3, 4, 5, 17, 4, 3, 21, 4, 5, 1, 48, 9, 42`
    sum := 0
    re := regexp.MustCompile(",\s*")

    vals := re.Split(data, -1)

    for _, val := range vals {
        n, err := strconv.Atoi(val)
        sum += n

        if err != nil {
            log.Fatal(err)
        }
    }

    fmt.Println(sum)
}

在代码示例中，我们有一个逗号分隔的值列表。我们从字符串中截取值并计算它们的总和。

re := regexp.MustCompile(",\s*")

正则表达式包括一个逗号字符和任意数量的相邻空格。

vals := re.Split(data, -1)

我们得到了值的一部分。

for _, val := range vals {
    n, err := strconv.Atoi(val)
    sum += n

    if err != nil {
        log.Fatal(err)
    }
}

我们遍历切片并计算总和。切片包含字符串；因此，我们使用 strconv.Atoi 函数将每个字符串转换为整数。

运行代码：

Go 正则表达式捕获组

圆括号 () 用于创建捕获组。这允许我们将量词应用于整个组或将交替限制为正则表达式的一部分。为了找到捕获组（Go 使用术语子表达式），我们使用 FindStringSubmatch 函数。

package main

import (
    "fmt"
    "regexp"
)

func main() {
    websites := [...]string{"webcode.me", "zetcode.com", "freebsd.org", "netbsd.org"}
    re := regexp.MustCompile("(\w+)\.(\w+)")
    for _, website := range websites {

        parts := re.FindStringSubmatch(website)
        for i, _ := range parts {
            fmt.Println(parts[i])
        }

        fmt.Println("---------------------")
    }
}

在代码示例中，我们使用组将域名分为两部分。

re := regexp.MustCompile("(\w+)\.(\w+)")

我们用括号定义了两个组。

parts := re.FindStringSubmatch(website)

FindStringSubmatch 返回包含匹配项的字符串切片，包括来自捕获组的字符串。

运行代码：

$ go run capturegroups.go 
webcode.me
webcode
me
---------------------
zetcode.com
zetcode
com
---------------------
freebsd.org
freebsd
org
---------------------
netbsd.org
netbsd
org
---------------------

正则表达式替换字符串

可以用 ReplaceAllString 替换字符串。该方法返回修改后的字符串。

package main

import (
    "fmt"
    "io/ioutil"
    "log"
    "net/http"
    "regexp"
    "strings"
)

func main() {
    resp, err := http.Get("http://webcode.me")
    if err != nil {
        log.Fatal(err)
    }
    defer resp.Body.Close()

    body, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        log.Fatal(err)
    }

    content := string(body)

    re := regexp.MustCompile("<[^>]*>")
    replaced := re.ReplaceAllString(content, "")

    fmt.Println(strings.TrimSpace(replaced))
}

该示例读取网页的 HTML 数据并使用正则表达式去除其 HTML 标记。

resp, err := http.Get("http://webcode.me")

我们使用 http 包中的 Get 函数创建一个 GET 请求。

body, err := ioutil.ReadAll(resp.Body)

我们读取响应对象的主体。

re := regexp.MustCompile("<[^>]*>")

这个模式定义了一个匹配 HTML 标签的正则表达式。

replaced := re.ReplaceAllString(content, "")

我们使用 ReplaceAllString 方法删除所有标签。

ReplaceAllStringFunc 函数

ReplaceAllStringFunc 返回一个字符串的副本，其中正则表达式的所有匹配项都已替换为指定函数的返回值。

package main

import (
    "fmt"
    "regexp"
    "strings"
)

func main() {
    content := "an old eagle"

    re := regexp.MustCompile(`[^aeiou]`)

    fmt.Println(re.ReplaceAllStringFunc(content, strings.ToUpper))
}

在代码示例中，我们将 strings.ToUpper 函数应用于字符串的所有字符。

$ go run replaceallfunc.go 
aN oLD eaGLe