c语言 regex,Fregex: C语言简化版正则表达式,tiny-regex-c修改而来

Fregex是C语言的一个小型正则表达式库,源自tiny-regex-c,适用于资源有限的环境。它支持基本的正则表达式操作符,如点号、锚点、星号、加号等。尽管存在一些已知问题,如反向字符类的bug,但其编译后的代码大小在不同平台上的表现优秀,如ARM/Thumb环境下约1.5kb,8位AVR环境下约2kb。库提供了`re_compile`用于编译正则表达式,`re_matchp`和`re_match`用于匹配文本。计划中包括修复反向字符类的实现,增加对分支和组的支持,以及添加更多示例和性能测试。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

> gcc -Os -c re.c

> size re.o

text data bss dec hex filename

2319 0 544 2863 b2f re.o

For ARM/Thumb using GCC 4.8.1 it's around 1.5kb code and less RAM :

> arm-none-eabi-gcc -Os -mthumb -c re.c

> size re.o

text data bss dec hex filename

1418 0 280 1698 6a2 re.o

For 8-bit AVR using AVR-GCC 4.8.1 it's around 2kb code and less RAM :

> avr-gcc -Os -c re.c

> size re.o

text data bss dec hex filename

2128 0 130 2258 8d2 re.o

API

This is the public / exported API:

/* Typedef'd pointer to hide implementation details. */

typedef struct regex_t* re_t;

/* Compiles regex string pattern to a regex_t-array. */

re_t re_compile(const char* pattern);

/* Finds matches of the compiled pattern inside text. */

int re_matchp(re_t pattern, const char* text);

/* Finds matches of pattern inside text (compiles first automatically). */

int re_match(const char* pattern, const char* text);

Supported regex-operators

The following features / regex-operators are supported by this library.

NOTE: inverted character classes are buggy - see the test harness for concrete examples.

. Dot, matches any character

^ Start anchor, matches beginning of string

$ End anchor, matches end of string

* Asterisk, match zero or more (greedy)

+ Plus, match one or more (greedy)

? Question, match zero or one (non-greedy) Îʺűí´ïʽ´æÔÚBUG

[abc] Character class, match if one of {'a', 'b', 'c'}

[^abc] Inverted class, match if NOT one of {'a', 'b', 'c'}

NOTE: This feature is currently broken for some usage of character ranges!

[a-zA-Z] Character ranges, the character set of the ranges { a-z | A-Z }

\s Whitespace, \t \f \r \n \v and spaces

\S Non-whitespace

\w Alphanumeric, [a-zA-Z0-9_]

\W Non-alphanumeric

\d Digits, [0-9]

\D Non-digits

Usage

Compile a regex from ASCII-string (char-array) to a custom pattern structure using re_compile().

Search a text-string for a regex and get an index into the string, using re_match() or re_matchp().

The returned index points to the first place in the string, where the regex pattern matches.

If the regular expression doesn't match, the matching function returns an index of -1 to indicate failure.

Examples

Example of usage:

/* Standard null-terminated C-string to search: */

const char* string_to_search = "ahem.. 'hello world !' ..";

/* Compile a simple regular expression using character classes, meta-char and greedy + non-greedy quantifiers: */

re_t pattern = re_compile("[Hh]ello [Ww]orld\\s*[!]?");

/* Check if the regex matches the text: */

int match_idx = re_matchp(pattern, string_to_search);

if (match_idx != -1)

{

printf("match at idx %d.\n", match_idx);

}

For more usage examples I encourage you to look at the code in the tests-folder.

TODO

Fix the implementation of inverted character classes.

Fix implementation of branches (|), and see if that can lead us closer to groups as well, e.g. (a|b)+.

Add example.c that demonstrates usage.

Add tests/test_perf.c for performance and time measurements.

Testing: Improve pattern rejection testing.

FAQ

Q: What differentiates this library from other C regex implementations?

A: Well, the small size for one. <500 lines of C-code compiling to 2-3kb ROM, using very little RAM.

License

All material in this repository is in the public domain.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值