These posts are responsible for my love of regular expressions :)
Each field in the file could contain a string, an int, or a float. In particular to figure out the floats, I remember sitting down for a few hours and coming up with a regular expression, testing it in Perl on a bunch of test cases, and then went about the hard work of slowly converting the regex into an NFA to a DFA and then finally to an adjacency graph in a 2d matrix. A little bit of code to read in things a byte at a time and some simple arithmetic and I had a stupid fast
bool isFloat(char *)
Nobody at my work understood the code, they just sort of knew that if they submitted a character array with a field value into it, something happened with a big matrix, and it you let them know if it was a float or not and then they could call the appropriate conversion function -- I think stof() or whatever was in fashion back then.I had to make one revision to it for an edge case I missed and then it found its way into pretty much all code for the company after that.
The tuition for the class that taught me how to do that payed for itself with that one function.
These articles were my introduction to finite automata (and almost the only place I've learned about them aside from practical use) and my intuition for them seem to often be better than my peers'.
The Dragon book contains an NFA to DFA algorithm (powerset) which explodes states with multiple exiting token transitions.
PS: All your DAGs are belong to us.
It's an indepth look at where regex in dotnet was prior to and after version 7.
It really needs to be noted how NP-complete and an algorithm efficient in practice have nothing to do with each other. If an algorithm takes C * 1.000...eighty zeroes...1^n time it is exponential but for any n you will encounter in real life the time it takes will be indistinguishable from C.
On the other hand, it's remarkable that so many algorithms _do_ have reasonable constants/exponents. So the concept works pretty well.
And the reality is that most NP-complete problems are well-solved by heuristics except in strange combinatorial corners. It's why we can't just create encryption systems based on knowing the correct answer to an NP-complete problem; most of the problem instances will be too easy to solve.
PCRE and theoretically pure regular expressions are different things.
Look-around is a different story, but I don't believe you can still guarantee linear time.
That doesn't help you if you're stuck using a library that doesn't perform those optimizations, but it means you need to be careful about importing your assumptions about regex performance from one language to another.
See also: https://github.com/burntSushi/rebar
> a linear find should be equivalent to a well implemented regex search for a specific string of characters
This is only true theoretically. If your regex engine is just a DFA scan that does char-at-a-time matching, then any "well implemented" substring search algorithm is going to blow it out of the water because it likely uses SIMD. Of course, many regex engines (including RE2), detect literals in the regex and use SIMD to scan for it. So while "DFA scan that does char-a-time matching" is a fine semantic model to have (to a first approximation) for a finite automata based regex engine, the model breaks down pretty quickly for reasoning about performance in general.
But KMP and similar algorithms can do better; they can get performance approaching O(n/k) for many cases by not even looking at every character of the input string.
In most cases the parser will not be shorter (code) than the regex, but it will most certainly be simpler (cognitively; it's hard to understand all that regexes do, they're close to magic) and more testable (you can test the parts of the parser, like the number parser or the string parser, individually), and often faster too.
But yes, you're right. Sometimes simple finds are sufficient.
Regular Expression Matching Can Be Simple and Fast (2007) - https://news.ycombinator.com/item?id=20310669 - June 2019 (33 comments)
Regular Expression Matching Can Be Simple and Fast (2007) - https://news.ycombinator.com/item?id=16341519 - Feb 2018 (20 comments)
Regular Expression Matching Can Be Simple and Fast (2007) - https://news.ycombinator.com/item?id=9374858 - April 2015 (16 comments)
Regular Expression Matching Can Be Simple And Fast [2007] - https://news.ycombinator.com/item?id=6593331 - Oct 2013 (1 comment)
Regular Expression Matching Can Be Simple And Fast - https://news.ycombinator.com/item?id=820201 - Sept 2009 (15 comments)
Regular Expression Matching Can Be Simple And Fast - https://news.ycombinator.com/item?id=466845 - Feb 2009 (17 comments)