That's extremely interesting, as I'm still favoring range- and binary search for most cases, just normalization lookup with two-step tables.
(not meant as a complaint that this might have been submitted before already, I'm just curious about what might have already been said about it)
If so, regular languages are one of those 'surprising things that can be parallelized'. It doesn't seem possible at first thought though.
[1] https://www.microsoft.com/en-us/research/wp-content/uploads/...
That said, there are techniques around to evaluate a context free grammars (and thus also regular grammars) in parallel. One way to do it is as follows:
Break up the input string into n separate strings. The processing of the first string starts at a single state, the starting state. Each other string starts at the set of possible states, with the output being a mapping between state at the first character to the state after the last.
Then at the end the output is stitched together.
Yes, it is a surprising result. But that is what makes them worthy of study.
Hillis / Steele even did it using the standard technique of prefix sums over an associative operator. So it's the sort of thing that a parallel programmer should be able to invent on their own (if they are expert enough with prefix sum pattern)
----
EDIT: For regular languages, here's a "really simple" method that achieves 2x parallelism.
1. Thread#1 parses from the start.
2. Reverse the language / FSM, and Thread#2 parses from the end working to the beginning using the reversed FSM.
3. When Thread#1 and Thread#2 meet in the middle, ensure that there is a valid state-transition between the two states. If so, accept the string. Otherwise, reject it.
--------
This concept can become generalized to "starting in the middle" of any regular language, but that's the tricky part. The first and last characters are the "obvious" starting points. Once you can "start at any location", and learn to match up states, you've got general parallelism for the entire set of characters in the whole string.
> Each other string starts at the set of possible states
A good start.
The Hillis / Steele prefix sum methodology is a bit more nuanced.
Compute the following in parallel:
* a[0] + a[1]
* a[2] + a[3]
* a[4] + a[5]
...
* a[2n] + a[2n+1]
Where + is the transition function between states. This results in n/2 possible FSM machines. Once you've done this, then compute:
* (a[0] + a[1]) + (a[2] + a[3])
* (a[4] + a[5]) + (a[6] + a[7])
...
* (a[4n] + a[4n+1]) + (a[4n+2] + a[4n+3])
Which represents "taking 4 steps of the finite-state-machine, starting at a[0], a[4], a[8], etc. etc.".
Each time you do this, you double the length of the substrings computed. After O(log2(n)) steps, you'll have processed the entire string with O(n) processors.
---
I don't think the Hillis / Steele algorithm is practical. But it proves that the problem is in Nick's Class of complexity (aka: NC complexity) and therefore worthy of parallelism study.