> gcc -Os -c re.c
> size re.o
text data bss dec hex filename
2319 0 544 2863 b2f re.o
For ARM/Thumb using GCC 4.8.1 it's around 1.5kb code and less RAM :
> arm-none-eabi-gcc -Os -mthumb -c re.c
> size re.o
text data bss dec hex filename
1418 0 280 1698 6a2 re.o
For 8-bit AVR using AVR-GCC 4.8.1 it's around 2kb code and less RAM :
> avr-gcc -Os -c re.c
> size re.o
text data bss dec hex filename
2128 0 130 2258 8d2 re.o
API
This is the public / exported API:
/* Typedef'd pointer to hide implementation details. */
typedef struct regex_t* re_t;
/* Compiles regex string pattern to a regex_t-array. */
re_t re_compile(const char* pattern);
/* Finds matches of the compiled pattern inside text. */
int re_matchp(re_t pattern, const char* text);
/* Finds matches of pattern inside text (compiles first automatically). */
int re_match(const char* pattern, const char* text);
Supported regex-operators
The following features / regex-operators are supported by this library.
NOTE: inverted character classes are buggy - see the test harness for concrete examples.
. Dot, matches any character
^ Start anchor, matches beginning of string
$ End anchor, matches end of string
* Asterisk, match zero or more (greedy)
+ Plus, match one or more (greedy)
? Question, match zero or one (non-greedy) Îʺűí´ïʽ´æÔÚBUG
[abc] Character class, match if one of {'a', 'b', 'c'}
[^abc] Inverted class, match if NOT one of {'a', 'b', 'c'}
NOTE: This feature is currently broken for some usage of character ranges!
[a-zA-Z] Character ranges, the character set of the ranges { a-z | A-Z }
\s Whitespace, \t \f \r \n \v and spaces
\S Non-whitespace
\w Alphanumeric, [a-zA-Z0-9_]
\W Non-alphanumeric
\d Digits, [0-9]
\D Non-digits
Usage
Compile a regex from ASCII-string (char-array) to a custom pattern structure using re_compile().
Search a text-string for a regex and get an index into the string, using re_match() or re_matchp().
The returned index points to the first place in the string, where the regex pattern matches.
If the regular expression doesn't match, the matching function returns an index of -1 to indicate failure.
Examples
Example of usage:
/* Standard null-terminated C-string to search: */
const char* string_to_search = "ahem.. 'hello world !' ..";
/* Compile a simple regular expression using character classes, meta-char and greedy + non-greedy quantifiers: */
re_t pattern = re_compile("[Hh]ello [Ww]orld\\s*[!]?");
/* Check if the regex matches the text: */
int match_idx = re_matchp(pattern, string_to_search);
if (match_idx != -1)
{
printf("match at idx %d.\n", match_idx);
}
For more usage examples I encourage you to look at the code in the tests-folder.
TODO
Fix the implementation of inverted character classes.
Fix implementation of branches (|), and see if that can lead us closer to groups as well, e.g. (a|b)+.
Add example.c that demonstrates usage.
Add tests/test_perf.c for performance and time measurements.
Testing: Improve pattern rejection testing.
FAQ
Q: What differentiates this library from other C regex implementations?
A: Well, the small size for one. <500 lines of C-code compiling to 2-3kb ROM, using very little RAM.
License
All material in this repository is in the public domain.