Yes. I work on regex engines. Here's the code with all the crazy checks that return errors that can never happen:
fn find_fwd_imp<A: Automaton + ?Sized>(
dfa: &A,
input: &Input<'_, '_>,
pre: Option<&'_ dyn Prefilter>,
earliest: bool,
) -> Result<Option<HalfMatch>, MatchError> {
let mut mat = None;
let mut sid = init_fwd(dfa, input)?;
let mut at = input.start();
while at < input.end() {
let byte = match input.haystack().get(at) {
None => return Err(MatchError::GaveUp { offset: at }),
Some(&byte) => byte,
};
sid = dfa.try_next_state(sid, byte)?;
if dfa.is_special_state(sid) {
if dfa.is_match_state(sid) {
let pattern = dfa.match_pattern(sid, 0);
mat = Some(HalfMatch::new(pattern, at));
} else if dfa.is_dead_state(sid) {
return Ok(mat);
}
}
at = match at.checked_add(1) {
None => return Err(MatchError::GaveUp { offset: at }),
Some(at) => at,
};
}
eoi_fwd(dfa, input, &mut sid, &mut mat)?;
Ok(mat)
}
And here's a search (I ran it a few times):
$ regex-cli count matches dfa dense @$medium '\w{42}'
parse time: 45.331µs
translate time: 28.767µs
compile nfa time: 9.577558ms
nfa memory: 698148
compile dense dfa time: 516.497476ms
dense dfa memory: 6562848
dense alphabet length: 113
dense stride: 128
search time: 531.808124ms
counts: [2]
Here's code that's safe that will panic when there's a bug:
fn find_fwd_imp<A: Automaton + ?Sized>(
dfa: &A,
input: &Input<'_, '_>,
pre: Option<&'_ dyn Prefilter>,
earliest: bool,
) -> Result<Option<HalfMatch>, MatchError> {
let mut mat = None;
let mut sid = init_fwd(dfa, input)?;
let mut at = input.start();
while at < input.end() {
sid = dfa.next_state(sid, input.haystack()[at]);
if dfa.is_special_state(sid) {
if dfa.is_match_state(sid) {
let pattern = dfa.match_pattern(sid, 0);
mat = Some(HalfMatch::new(pattern, at));
} else if dfa.is_dead_state(sid) {
return Ok(mat);
}
}
at += 1;
}
eoi_fwd(dfa, input, &mut sid, &mut mat)?;
Ok(mat)
}
And here's a search with it:
$ regex-cli count matches dfa dense @$medium '\w{42}'
parse time: 33.701µs
translate time: 24.264µs
compile nfa time: 9.55935ms
nfa memory: 698148
compile dense dfa time: 550.959698ms
dense dfa memory: 6562848
dense alphabet length: 113
dense stride: 128
search time: 468.932568ms
counts: [2]
So that's 10% right there. We can keep going though. Let's get rid of the bounds checks. We have to use unsafe though. The stakes have risen. Now if there's a bug, we probably won't get a panic. Instead we get UB. Which might lead to all sorts of bad stuff.
fn find_fwd_imp<A: Automaton + ?Sized>(
dfa: &A,
input: &Input<'_, '_>,
pre: Option<&'_ dyn Prefilter>,
earliest: bool,
) -> Result<Option<HalfMatch>, MatchError> {
let mut mat = None;
let mut sid = init_fwd(dfa, input)?;
let mut at = input.start();
while at < input.end() {
sid = unsafe {
let byte = *input.haystack().get_unchecked(at);
dfa.next_state_unchecked(sid, byte)
};
if dfa.is_special_state(sid) {
if dfa.is_match_state(sid) {
let pattern = dfa.match_pattern(sid, 0);
mat = Some(HalfMatch::new(pattern, at));
} else if dfa.is_dead_state(sid) {
return Ok(mat);
}
}
at += 1;
}
eoi_fwd(dfa, input, &mut sid, &mut mat)?;
Ok(mat)
}
And now the same search:
$ regex-cli count matches dfa dense @$medium '\w{42}'
parse time: 22.511µs
translate time: 26.843µs
compile nfa time: 8.709352ms
nfa memory: 698148
compile dense dfa time: 565.365005ms
dense dfa memory: 6562848
dense alphabet length: 113
dense stride: 128
search time: 456.402963ms
counts: [2]
For a smaller gain.