本项目的目的是测试和比较不同的正则表达式库在匹配手机号、银行卡号、身份证号等常见数据类型时的性能表现。我们选用了四个常见的正则表达式库:
-
Go 原生正则库:Go 语言自带的
regexp
包。 -
RE2 库:高效且安全的正则表达式库,Go 原生正则库的底层实现。
-
Hyperscan 库:高性能、多模式匹配引擎,适用于需要处理大量数据的场景。
-
Grok 库:专为日志解析设计的正则表达式库,支持丰富的模式匹配。
我们通过不同的测试样例(手机号、银行卡号、身份证号),对每个库的性能进行对比分析,包括:
-
时间:执行匹配操作所耗费的时间。
-
内存使用:匹配操作过程中占用的内存资源。
测试结果将帮助我们评估各库在不同场景下的性能差异,并为实际应用中的库选择提供参考
正则表达式
在该项目中,我们重点比较了四个不同的正则表达式库在匹配常见数据类型(如手机号、银行卡号、身份证号)时的性能表现。这些数据类型使用了以下三个正则表达式:
手机号正则表达式(phoneRegex
):
^1[3-9]\d{9}$
-
匹配以1开头的中国大陆手机号,其中第二位数字为3-9,后面跟随9位数字。
银行卡号正则表达式(bankCardRegex
):
^\d{16,19}$
-
匹配长度为16到19位的纯数字,这通常是银行卡号的格式。
身份证号正则表达式(idCardRegex
):
^\d{17}[\dX]$
-
匹配大陆身份证号,由17位数字和最后一位校验码(可能为数字或X)组成。
代码
package main
import (
"fmt"
"regexp"
"runtime"
"time"
"github.com/flier/gohs/hyperscan"
"github.com/trivago/grok"
"github.com/wasilibs/go-re2"
)
// 测试样例:手机号、银行卡号、身份证号列表
var phoneNumbers = []string{
"13812345678", "15898765432", "18912344321", "13987654321",
}
var bankCardNumbers = []string{
"6217003810042196610", "6222023810001009123", "6217003810042196611",
}
var idCardNumbers = []string{
"11010519880605371X", "32038219880509581X", "110105198806053719",
}
// 正则表达式
var phoneRegex = `^1[3-9]\d{9}$`
var bankCardRegex = `^\d{16,19}$`
var idCardRegex = `^\d{17}[\dX]$`
// 匹配函数类型
type matchFunc func(string, string) bool
// Go 自带的正则库
func matchWithGoRegexp(s string, regex string) bool {
re := regexp.MustCompile(regex)
return re.MatchString(s)
}
// 使用 re2 库
func matchWithre2(s string, regex string) bool {
r, _ := re2.Compile(regex)
return r.MatchString(s)
}
// 使用 hyperscan 库
func matchWithhyperscan(s string, regex string) bool {
matched, _ := hyperscan.Match(regex, []byte(s))
return matched
}
// 使用 grok 库
func matchWithgrok(s string, regex string) bool {
g, err := grok.New(grok.Config{Patterns: grok.DefaultPatterns})
if err != nil {
panic(err)
}
compiledGrok, err := g.Compile(regex)
if err != nil {
panic(err)
}
return compiledGrok.MatchString(s)
}
// 性能测试
func testRegexPerformance(name string, fn matchFunc, testCases []string, regex string, iterations int) {
fmt.Printf("Testing %s...\n", name)
// 测试开始时间和内存
start := time.Now()
var memStart runtime.MemStats
runtime.ReadMemStats(&memStart)
// 生成足够的测试用例
totalCases := make([]string, 0, iterations)
for i := 0; i < iterations; i++ {
totalCases = append(totalCases, testCases[i%len(testCases)]) // 循环使用测试案例
}
for _, testCase := range totalCases {
fn(testCase, regex)
}
// 测试结束时间和内存
elapsed := time.Since(start)
var memEnd runtime.MemStats
runtime.ReadMemStats(&memEnd)
memUsed := int64(memEnd.TotalAlloc) - int64(memStart.TotalAlloc) // 转换为 int64
// 使用更大单位显示内存使用
if memUsed < 1024 {
fmt.Printf("%s took %s, Memory used: %.2f bytes\n", name, elapsed, float64(memUsed))
} else if memUsed < 1024*1024 {
fmt.Printf("%s took %s, Memory used: %.2f KB\n", name, elapsed, float64(memUsed)/1024)
} else if memUsed < 1024*1024*1024 {
fmt.Printf("%s took %s, Memory used: %.2f MB\n", name, elapsed, float64(memUsed)/(1024*1024))
} else {
fmt.Printf("%s took %s, Memory used: %.2f GB\n", name, elapsed, float64(memUsed)/(1024*1024*1024))
}
// // 垃圾回收并测量内存
// runtime.GC()
// runtime.ReadMemStats(&memEnd)
// // 计算 GC 后的内存使用
// memAfterGC := int64(memEnd.TotalAlloc) - int64(memStart.TotalAlloc) // 转换为 int64
// // 使用更大单位显示内存使用
// if memAfterGC < 1024 {
// fmt.Printf("%s Memory used after GC:%.2f bytes\n\n", name, float64(memAfterGC))
// } else if memAfterGC < 1024*1024 {
// fmt.Printf("%s Memory used after GC: %.2f KB\n\n", name, float64(memAfterGC)/1024)
// } else if memAfterGC < 1024*1024*1024 {
// fmt.Printf("%s Memory used after GC: %.2f MB\n\n", name, float64(memAfterGC)/(1024*1024))
// } else {
// fmt.Printf("%s Memory used after GC: %.2f GB\n\n", name, float64(memAfterGC)/(1024*1024*1024))
// }
}
func main() {
var iterationsList []int = []int{1, 10, 100, 1000, 10000, 100000}
for _, iterations := range iterationsList {
fmt.Printf("\nTesting with %d iterations...\n", iterations)
// 测试手机号正则匹配
testRegexPerformance("Go Regex (Phone)", matchWithGoRegexp, phoneNumbers, phoneRegex, iterations)
testRegexPerformance("grok (Phone)", matchWithgrok, phoneNumbers, phoneRegex, iterations)
testRegexPerformance("RE2 (Phone)", matchWithre2, phoneNumbers, phoneRegex, iterations)
testRegexPerformance("Hyperscan (Phone)", matchWithhyperscan, phoneNumbers, phoneRegex, iterations)
// 测试银行卡号正则匹配
testRegexPerformance("Go Regex (Bank Card)", matchWithGoRegexp, bankCardNumbers, bankCardRegex, iterations)
testRegexPerformance("grok (Bank Card)", matchWithgrok, bankCardNumbers, bankCardRegex, iterations)
testRegexPerformance("RE2 (Bank Card)", matchWithre2, bankCardNumbers, bankCardRegex, iterations)
testRegexPerformance("Hyperscan (Bank Card)", matchWithhyperscan, bankCardNumbers, bankCardRegex, iterations)
// 测试身份证号正则匹配
testRegexPerformance("Go Regex (ID Card)", matchWithGoRegexp, idCardNumbers, idCardRegex, iterations)
testRegexPerformance("grok (ID Card)", matchWithgrok, idCardNumbers, idCardRegex, iterations)
testRegexPerformance("RE2 (ID Card)", matchWithre2, idCardNumbers, idCardRegex, iterations)
testRegexPerformance("Hyperscan (ID Card)", matchWithhyperscan, idCardNumbers, idCardRegex, iterations)
}
}
输出
Testing with 1 iterations...
Testing Go Regex (Phone)...
Go Regex (Phone) took 147µs, Memory used: 5.67 KB
Testing grok (Phone)...
grok (Phone) took 434µs, Memory used: 234.77 KB
Testing RE2 (Phone)...
RE2 (Phone) took 338.4µs, Memory used: 169.16 KB
Testing Hyperscan (Phone)...
Hyperscan (Phone) took 961.1µs, Memory used: 800.00 bytes
Testing Go Regex (Bank Card)...
Go Regex (Bank Card) took 47.6µs, Memory used: 8.27 KB
Testing grok (Bank Card)...
grok (Bank Card) took 388.6µs, Memory used: 194.93 KB
Testing RE2 (Bank Card)...
RE2 (Bank Card) took 323µs, Memory used: 1.07 KB
Testing Hyperscan (Bank Card)...
Hyperscan (Bank Card) took 757.7µs, Memory used: 760.00 bytes
Testing Go Regex (ID Card)...
Go Regex (ID Card) took 174.6µs, Memory used: 7.36 KB
Testing grok (ID Card)...
grok (ID Card) took 554.9µs, Memory used: 193.61 KB
Testing RE2 (ID Card)...
RE2 (ID Card) took 215.6µs, Memory used: 1.07 KB
Testing Hyperscan (ID Card)...
Hyperscan (ID Card) took 598.5µs, Memory used: 760.00 bytes
Testing with 10 iterations...
Testing Go Regex (Phone)...
Go Regex (Phone) took 169.1µs, Memory used: 50.55 KB
Testing grok (Phone)...
grok (Phone) took 3.647201ms, Memory used: 1.87 MB
Testing RE2 (Phone)...
RE2 (Phone) took 616µs, Memory used: 10.70 KB
Testing Hyperscan (Phone)...
Hyperscan (Phone) took 4.3829ms, Memory used: 7.31 KB
Testing Go Regex (Bank Card)...
Go Regex (Bank Card) took 235.3µs, Memory used: 82.73 KB
Testing grok (Bank Card)...
grok (Bank Card) took 4.0071ms, Memory used: 1.90 MB
Testing RE2 (Bank Card)...
RE2 (Bank Card) took 837.1µs, Memory used: 10.70 KB
Testing Hyperscan (Bank Card)...
Hyperscan (Bank Card) took 5.6729ms, Memory used: 7.34 KB
Testing Go Regex (ID Card)...
Go Regex (ID Card) took 199.4µs, Memory used: 73.59 KB
Testing grok (ID Card)...
grok (ID Card) took 3.9797ms, Memory used: 1.89 MB
Testing RE2 (ID Card)...
RE2 (ID Card) took 782.1µs, Memory used: 10.70 KB
Testing Hyperscan (ID Card)...
Hyperscan (ID Card) took 5.2922ms, Memory used: 7.34 KB
Testing with 100 iterations...
Testing Go Regex (Phone)...
Go Regex (Phone) took 1.3595ms, Memory used: 505.66 KB
Testing grok (Phone)...
grok (Phone) took 35.803201ms, Memory used: 18.85 MB
Testing RE2 (Phone)...
RE2 (Phone) took 2.9211ms, Memory used: 107.22 KB
Testing Hyperscan (Phone)...
Hyperscan (Phone) took 37.658501ms, Memory used: 72.84 KB
Testing Go Regex (Bank Card)...
Go Regex (Bank Card) took 735.6µs, Memory used: 827.62 KB
Testing grok (Bank Card)...
grok (Bank Card) took 33.308501ms, Memory used: 19.32 MB
Testing RE2 (Bank Card)...
RE2 (Bank Card) took 5.6718ms, Memory used: 107.22 KB
Testing Hyperscan (Bank Card)...
Hyperscan (Bank Card) took 56.628102ms, Memory used: 73.62 KB
Testing Go Regex (ID Card)...
Go Regex (ID Card) took 706.4µs, Memory used: 736.12 KB
Testing grok (ID Card)...
grok (ID Card) took 33.489901ms, Memory used: 19.07 MB
Testing RE2 (ID Card)...
RE2 (ID Card) took 4.8854ms, Memory used: 107.22 KB
Testing Hyperscan (ID Card)...
Hyperscan (ID Card) took 48.555101ms, Memory used: 74.23 KB
Testing with 1000 iterations...
Testing Go Regex (Phone)...
Go Regex (Phone) took 6.3989ms, Memory used: 4.94 MB
Testing grok (Phone)...
grok (Phone) took 296.184308ms, Memory used: 188.85 MB
Testing RE2 (Phone)...
RE2 (Phone) took 35.4893ms, Memory used: 1.26 MB
Testing Hyperscan (Phone)...
Hyperscan (Phone) took 353.388309ms, Memory used: 734.77 KB
Testing Go Regex (Bank Card)...
Go Regex (Bank Card) took 8.666801ms, Memory used: 8.09 MB
Testing grok (Bank Card)...
grok (Bank Card) took 325.838308ms, Memory used: 192.00 MB
Testing RE2 (Bank Card)...
RE2 (Bank Card) took 56.139501ms, Memory used: 1.05 MB
Testing Hyperscan (Bank Card)...
Hyperscan (Bank Card) took 537.913414ms, Memory used: 742.56 KB
Testing Go Regex (ID Card)...
Go Regex (ID Card) took 9.1288ms, Memory used: 7.24 MB
Testing grok (ID Card)...
grok (ID Card) took 369.78511ms, Memory used: 191.27 MB
Testing RE2 (ID Card)...
RE2 (ID Card) took 51.905201ms, Memory used: 1.05 MB
Testing Hyperscan (ID Card)...
Hyperscan (ID Card) took 609.0739ms, Memory used: 742.58 KB
Testing with 10000 iterations...
Testing Go Regex (Phone)...
Go Regex (Phone) took 97.1449ms, Memory used: 49.43 MB
Testing grok (Phone)...
grok (Phone) took 3.047004091s, Memory used: 1.84 GB
Testing RE2 (Phone)...
RE2 (Phone) took 286.305799ms, Memory used: 10.83 MB
Testing Hyperscan (Phone)...
Hyperscan (Phone) took 3.42459599s, Memory used: 7.34 MB
Testing Go Regex (Bank Card)...
Go Regex (Bank Card) took 75.740699ms, Memory used: 80.81 MB
Testing grok (Bank Card)...
grok (Bank Card) took 2.978225733s, Memory used: 1.87 GB
Testing RE2 (Bank Card)...
RE2 (Bank Card) took 596.521708ms, Memory used: 10.96 MB
Testing Hyperscan (Bank Card)...
Hyperscan (Bank Card) took 5.02266107s, Memory used: 7.29 MB
Testing Go Regex (ID Card)...
Go Regex (ID Card) took 71.491502ms, Memory used: 71.88 MB
Testing grok (ID Card)...
grok (ID Card) took 2.973578556s, Memory used: 1.86 GB
Testing RE2 (ID Card)...
RE2 (ID Card) took 542.933711ms, Memory used: 10.93 MB
Testing Hyperscan (ID Card)...
Hyperscan (ID Card) took 4.130735778s, Memory used: 7.31 MB
Testing with 100000 iterations...
Testing Go Regex (Phone)...
Go Regex (Phone) took 560.546612ms, Memory used: 493.68 MB
Testing grok (Phone)...
grok (Phone) took 31.42383057s, Memory used: 18.38 GB
Testing RE2 (Phone)...
RE2 (Phone) took 3.28167285s, Memory used: 109.65 MB
Testing Hyperscan (Phone)...
Hyperscan (Phone) took 36.851407785s, Memory used: 71.94 MB
Testing Go Regex (Bank Card)...
Go Regex (Bank Card) took 752.315092ms, Memory used: 808.03 MB
Testing grok (Bank Card)...
grok (Bank Card) took 31.07751491s, Memory used: 18.69 GB
Testing RE2 (Bank Card)...
RE2 (Bank Card) took 5.978553846s, Memory used: 109.83 MB
Testing Hyperscan (Bank Card)...
Hyperscan (Bank Card) took 53.647871594s, Memory used: 72.52 MB
Testing Go Regex (ID Card)...
Go Regex (ID Card) took 737.583502ms, Memory used: 718.77 MB
Testing grok (ID Card)...
grok (ID Card) took 32.139839778s, Memory used: 18.61 GB
Testing RE2 (ID Card)...
RE2 (ID Card) took 5.738499301s, Memory used: 109.50 MB
Testing Hyperscan (ID Card)...
Hyperscan (ID Card) took 44.914807112s, Memory used: 72.86 MB
时间表 -电话
迭代次数 | Go Regex (Phone) | grok (Phone) | RE2 (Phone) | Hyperscan (Phone) | 耗时最短引擎 | 耗时最长引擎 |
1 | 147µs | 434µs | 338.4µs | 961.1µs | Go Regex (Phone) | Hyperscan (Phone) |
10 | 169.1µs | 3.647201ms | 616µs | 4.3829ms | Go Regex (Phone) | grok (Phone) |
100 | 1.3595ms | 35.803201ms | 2.9211ms | 37.658501ms | Go Regex (Phone) | grok (Phone) |
1000 | 6.3989ms | 296.184308ms | 35.4893ms | 353.388309ms | Go Regex (Phone) | grok (Phone) |
10000 | 97.1449ms | 3.047004091s | 286.305799ms | 3.42459599s | Go Regex (Phone) | grok (Phone) |
100000 | 560.546612ms | 31.42383057s | 3.28167285s | 36.851407785s | Go Regex (Phone) | grok (Phone) |
内存表 -电话
迭代次数 | Go Regex (Phone) | grok (Phone) | RE2 (Phone) | Hyperscan (Phone) | 内存占用最少引擎 | 内存占用最多引擎 |
1 | 5.67 KB | 234.77 KB | 169.16 KB | 800.00 bytes | Hyperscan (Phone) | grok (Phone) |
10 | 50.55 KB | 1.87 MB | 10.70 KB | 7.31 KB | RE2 (Phone) | grok (Phone) |
100 | 505.66 KB | 18.85 MB | 107.22 KB | 72.84 KB | Hyperscan (Phone) | grok (Phone) |
1000 | 4.94 MB | 188.85 MB | 1.26 MB | 734.77 KB | RE2 (Phone) | grok (Phone) |
10000 | 49.43 MB | 1.84 GB | 10.83 MB | 7.34 MB | RE2 (Phone) | grok (Phone) |
100000 | 493.68 MB | 18.38 GB | 109.65 MB | 71.94 MB | RE2 (Phone) | grok (Phone) |
时间表 - 银行卡
迭代次数 | Go Regex (Bank Card) | grok (Bank Card) | RE2 (Bank Card) | Hyperscan (Bank Card) | 耗时最短引擎 | 耗时最长引擎 |
1 | 47.6µs | 388.6µs | 323µs | 757.7µs | Go Regex (Bank Card) | Hyperscan (Bank Card) |
10 | 235.3µs | 4.0071ms | 837.1µs | 5.6729ms | Go Regex (Bank Card) | Hyperscan (Bank Card) |
100 | 735.6µs | 33.308501ms | 5.6718ms | 56.628102ms | Go Regex (Bank Card) | Hyperscan (Bank Card) |
1000 | 8.666801ms | 325.838308ms | 56.139501ms | 537.913414ms | Go Regex (Bank Card) | Hyperscan (Bank Card) |
10000 | 75.740699ms | 2.978225733s | 596.521708ms | 5.02266107s | Go Regex (Bank Card) | grok (Bank Card) |
100000 | 752.315092ms | 31.07751491s | 5.978553846s | 53.647871594s | Go Regex (Bank Card) | grok (Bank Card) |
内存表 - 银行卡
迭代次数 | Go Regex (Bank Card) | grok (Bank Card) | RE2 (Bank Card) | Hyperscan (Bank Card) | 内存占用最少引擎 | 内存占用最多引擎 |
1 | 8.27 KB | 194.93 KB | 1.07 KB | 760.00 bytes | RE2 (Bank Card) | grok (Bank Card) |
10 | 82.73 KB | 1.90 MB | 10.70 KB | 7.34 KB | Hyperscan (Bank Card) | grok (Bank Card) |
100 | 827.62 KB | 19.32 MB | 107.22 KB | 73.62 KB | Hyperscan (Bank Card) | grok (Bank Card) |
1000 | 8.09 MB | 192.00 MB | 1.05 MB | 742.56 KB | RE2 (Bank Card) | grok (Bank Card) |
10000 | 80.81 MB | 1.87 GB | 10.96 MB | 7.29 MB | RE2 (Bank Card) | grok (Bank Card) |
100000 | 808.03 MB | 18.69 GB | 109.83 MB | 72.52 MB | RE2 (Bank Card) | grok (Bank Card) |
时间表 - 身份证
迭代次数 | Go Regex (ID Card) | grok (ID Card) | RE2 (ID Card) | Hyperscan (ID Card) | 耗时最短引擎 | 耗时最长引擎 |
1 | 174.6µs | 554.9µs | 215.6µs | 598.5µs | RE2 (ID Card) | grok (ID Card) |
10 | 199.4µs | 3.9797ms | 782.1µs | 5.2922ms | Go Regex (ID Card) | Hyperscan (ID Card) |
100 | 706.4µs | 33.489901ms | 4.8854ms | 48.555101ms | RE2 (ID Card) | Hyperscan (ID Card) |
1000 | 9.1288ms | 369.78511ms | 51.905201ms | 609.0739ms | RE2 (ID Card) | grok (ID Card) |
10000 | 71.491502ms | 2.973578556s | 542.933711ms | 4.130735778s | RE2 (ID Card) | grok (ID Card) |
100000 | 737.583502ms | 32.139839778s | 5.738499301s | 44.914807112s | RE2 (ID Card) | grok (ID Card) |
内存表 - 身份证
迭代次数 | Go Regex (ID Card) | grok (ID Card) | RE2 (ID Card) | Hyperscan (ID Card) | 内存占用最多引擎 | 内存占用最多引擎 |
1 | 7.36 KB | 193.61 KB | 1.07 KB | 760.00 bytes | Hyperscan (ID Card) | grok (ID Card) |
10 | 73.59 KB | 1.89 MB | 10.70 KB | 7.34 KB | Hyperscan (ID Card) | grok (ID Card) |
100 | 736.12 KB | 19.07 MB | 107.22 KB | 74.23 KB | Hyperscan (ID Card) | grok (ID Card) |
1000 | 7.24 MB | 191.27 MB | 1.05 MB | 742.58 KB | RE2 (ID Card) | grok (ID Card) |
10000 | 71.88 MB | 1.86 GB | 10.93 MB | 7.31 MB | RE2 (ID Card) | grok (ID Card) |
100000 | 718.77 MB | 18.61 GB | 109.50 MB | 72.86 MB | RE2 (ID Card) | grok (ID Card) |
时间表
迭代次数 | Go Regex (Phone) | grok (Phone) | RE2 (Phone) | Hyperscan (Phone) | Go Regex (Bank Card) | grok (Bank Card) | RE2 (Bank Card) | Hyperscan (Bank Card) | Go Regex (ID Card) | grok (ID Card) | RE2 (ID Card) | Hyperscan (ID Card) |
1 | 147µs | 434µs | 338.4µs | 961.1µs | 47.6µs | 388.6µs | 323µs | 757.7µs | 174.6µs | 554.9µs | 215.6µs | 598.5µs |
10 | 169.1µs | 3.647201ms | 616µs | 4.3829ms | 235.3µs | 4.0071ms | 837.1µs | 5.6729ms | 199.4µs | 3.9797ms | 782.1µs | 5.2922ms |
100 | 1.3595ms | 35.803201ms | 2.9211ms | 37.658501ms | 735.6µs | 33.308501ms | 5.6718ms | 56.628102ms | 706.4µs | 33.489901ms | 4.8854ms | 48.555101ms |
1000 | 6.3989ms | 296.184308ms | 35.4893ms | 353.388309ms | 8.666801ms | 325.838308ms | 56.139501ms | 537.913414ms | 9.1288ms | 369.78511ms | 51.905201ms | 609.0739ms |
10000 | 97.1449ms | 3.047004091s | 286.305799ms | 3.42459599s | 75.740699ms | 2.978225733s | 596.521708ms | 5.02266107s | 71.491502ms | 2.973578556s | 542.933711ms | 4.130735778s |
100000 | 560.546612ms | 31.42383057s | 3.28167285s | 36.851407785s | 752.315092ms | 31.07751491s | 5.978553846s | 53.647871594s | 737.583502ms | 32.139839778s | 5.738499301s | 44.914807112s |
内存表
迭代次数 | Go Regex (Phone) | grok (Phone) | RE2 (Phone) | Hyperscan (Phone) | Go Regex (Bank Card) | grok (Bank Card) | RE2 (Bank Card) | Hyperscan (Bank Card) | Go Regex (ID Card) | grok (ID Card) | RE2 (ID Card) | Hyperscan (ID Card) |
1 | 5.67 KB | 234.77 KB | 169.16 KB | 800.00 bytes | 8.27 KB | 194.93 KB | 1.07 KB | 760.00 bytes | 7.36 KB | 193.61 KB | 1.07 KB | 760.00 bytes |
10 | 50.55 KB | 1.87 MB | 10.70 KB | 7.31 KB | 82.73 KB | 1.90 MB | 10.70 KB | 7.34 KB | 73.59 KB | 1.89 MB | 10.70 KB | 7.34 KB |
100 | 505.66 KB | 18.85 MB | 107.22 KB | 72.84 KB | 827.62 KB | 19.32 MB | 107.22 KB | 73.62 KB | 736.12 KB | 19.07 MB | 107.22 KB | 74.23 KB |
1000 | 4.94 MB | 188.85 MB | 1.26 MB | 734.77 KB | 8.09 MB | 192.00 MB | 1.05 MB | 742.56 KB | 7.24 MB | 191.27 MB | 1.05 MB | 742.58 KB |
10000 | 49.43 MB | 1.84 GB | 10.83 MB | 7.34 MB | 80.81 MB | 1.87 GB | 10.96 MB | 7.29 MB | 71.88 MB | 1.86 GB | 10.93 MB | 7.31 MB |
100000 | 493.68 MB | 18.38 GB | 109.65 MB | 71.94 MB | 808.03 MB | 18.69 GB | 109.83 MB | 72.52 MB | 718.77 MB | 18.61 GB | 109.50 MB | 72.86 MB |
性能测试结果分析
-
Go 自带正则:在较小的测试集上表现良好,内存占用较低,时间性能也不错,但随着迭代数增加,内存使用显著上升。
-
grok:
grok
的性能不如其他库,尤其在多次迭代时,内存消耗巨大,时间开销也较高。因为grok
通常用于更复杂的日志解析,它的开销在这种简单匹配场景下会更大。 -
RE2:
re2
的性能和内存使用都很稳定,尤其在大量迭代下,表现优于 Go 原生正则,并且有较低的内存占用。 -
Hyperscan:对于手机号、银行卡号和身份证号的匹配,
hyperscan
表现出色,尤其是内存使用最低,但随着迭代增加,时间开销有所上升。
从结果可以看出:
-
Go 自带正则库:适合中等规模的正则匹配任务,如果对时间和内存要求不特别苛刻,可以放心使用。
-
RE2:更适合需要在较大规模数据集上进行正则匹配的情况,尤其是它的内存使用更高效。
-
Hyperscan:在内存使用上最为出色,适合超大规模数据匹配任务,虽然时间开销有所增加,但仍然是优秀的选择。
-
grok:如果仅仅是简单的正则匹配,不建议使用
grok
,因为它是为复杂的日志解析设计的工具。
配置注意
sudo apt-get update
sudo apt-get install pkg-config
sudo apt-get install libhyperscan-dev
参考
hyperscan package - github.com/flier/gohs/hyperscan - Go Packages
grok package - github.com/trivago/grok - Go Packages Comparison of regular expression typesgrok package - github.com/trivago/grok - Go Packages Performance comparison of regular expression enginesgrok package - github.com/trivago/grok - Go Packages