It is possible to implement NFAs directly, in parallel:
https://github.com/mike-french/myrex
Just fan-out messages to all downstream states. Let a fair runtime system evaluate all progress, then tally results along all paths, for example:
https://github.com/mike-french/myrex?tab=readme-ov-file#exec...
The approach can also be adapted to captures and their many, many possible ambiguities. There is an example that generalizes a highly ambiguous match given in Cox: regex (a?){m}(a*){m} against a string of a{m}. The case m=9 gives 864k possible solutions (traversals), you may optionally ask for all of them to be returned:
https://github.com/mike-french/myrex?tab=readme-ov-file#ambi...
The processes share nothing, all state is in the messages, so a single network can process multiple input strings simultaneously. It can also generate Monte-Carlo strings that match the regex, and those may also proceed simultaneously.