Regular Expression Matching Can Be Simple and Fast (2007)

In “Regular Expression Matching Can Be Simple And Fast,” Russ Cox discusses two contrasting approaches to regular expression matching. One common approach, used in languages like Perl, is slow and can take up to over sixty seconds to match a 29-character string. In contrast, the Thompson NFA implementation is much faster, taking only twenty microseconds to match the same string. Cox explores the historical development of regular expressions and introduces efficient algorithms based on finite automata. By converting regular expressions into NFAs and simulating perfect guessing, these algorithms can process large input strings in linear time. The article includes a simple implementation of Thompson’s algorithm in under 400 lines of C code.

https://swtch.com/%7Ersc/regexp/regexp1.html