Off the top of my head, the only C code in ripgrep is optional integration with PCRE2. In addition to whatever libc is being used on POSIX platforms. Everything else is pure Rust.
Regular languages are closed under those operations after all.
Paul's talk introduced redgrep is amazing by the way. Give it a watch if you haven't yet: https://www.youtube.com/watch?v=Ukqb6nMjFyk
ripgrep's regex syntax is the same as Rust's regex crate: https://docs.rs/regex/1.4.4/regex/#syntax (Which is in turn similar to RE2, although it supports a bit more niceties.)
Oh, I didn't say anything about easy! I am on and off working on a Haskell re-implementation (but with GADTs and in Oleg's tagless final interpreter style etc, so it's more about exploring the type system).
> In practice, they explode the size of the underlying FSM.
You may be right, but that's still better than the gymnastics you'd have to do by hand to get the same features out of a 'normal' regex.
> Moreover, in a command line tool, it's somewhat easy to work around that through the `-v` switch and shell pipelining.
Alas, that only works, if your intersection or complement happen at the top level. You can't do something like
(A & not B) followed by (C & D)
that way.
> Paul's talk introduced redgrep is amazing by the way. Give it a watch if you haven't yet: https://www.youtube.com/watch?v=Ukqb6nMjFyk
I have, and I agree!
Perhaps I'll try and implement a basic version of redgrep in Rust as an exercise. (I just want something that supports basically all the operations regular languages are closed, but don't care too much about speed, as long as the runtime complexity is linear.)
Great work btw. Ripgrep is the best
... I will have to restrict my comment to just LLVM being a larger, c++, dependency
... Just angling for more downvotes ;) Thanks for the reply
I played with making a regex library in rust. Which, as per RE2 design involves constructing graphs and glueing them together as the regex is traversed
This requires a cycle catching gc, or, just a preallocated arena... It was my first foray into rust and felt I would need to be hitting into unsafe, which I wasn't ready for. Array indexing might decompose into an arena, but syntactically just a bit messier (imho)
Would be interesting to see how the RE2 does it in rust (didn't know that)
I like how the article shows both sides of the fence, it makes me realize:
I get a lot of optimizations from ptr stuffing in c. But sometimes we should lay down the good, for the better