Struct LazyStateID
struct LazyStateID(_)
A state identifier specifically tailored for lazy DFAs.
A lazy state ID logically represents a pointer to a DFA state. In practice, by limiting the number of DFA states it can address, it reserves some bits of its representation to encode some additional information. That additional information is called a "tag." That tag is used to record whether the state it points to is an unknown, dead, quit, start or match state.
When implementing a low level search routine with a lazy DFA, it is necessary to query the type of the current state to know what to do:
- Unknown - The state has not yet been computed. The
parameters used to get this state ID must be re-passed to
DFA::next_state, which will never return an unknown state ID. - Dead - A dead state only has transitions to itself. It indicates that the search cannot do anything else and should stop with whatever result it has.
- Quit - A quit state indicates that the automaton could not answer
whether a match exists or not. Correct search implementations must return a
MatchError::quitwhen a DFA enters a quit state. - Start - A start state is a state in which a search can begin.
Lazy DFAs usually have more than one start state. Branching on
this isn't required for correctness, but a common optimization is
to run a prefilter when a search enters a start state. Note that
start states are not tagged automatically, and one must enable the
Config::specialize_start_statessetting for start states to be tagged. The reason for this is that a DFA search loop is usually written to execute a prefilter once it enters a start state. But if there is no prefilter, this handling can be quite diastrous as the DFA may ping-pong between the special handling code and a possible optimized hot path for handling untagged states. When start states aren't specialized, then they are untagged and remain in the hot path. - Match - A match state indicates that a match has been found. Depending on the semantics of your search implementation, it may either continue until the end of the haystack or a dead state, or it might quit and return the match immediately.
As an optimization, the is_tagged predicate
can be used to determine if a tag exists at all. This is useful to avoid
branching on all of the above types for every byte searched.
Example
This example shows how LazyStateID can be used to implement a correct
search routine with minimal branching. In particular, this search routine
implements "leftmost" matching, which means that it doesn't immediately
stop once a match is found. Instead, it continues until it reaches a dead
state.
Notice also how a correct search implementation deals with
CacheErrors returned by some of
the lazy DFA routines. When a CacheError occurs, it returns
MatchError::gave_up.
use ;
// We use a greedy '+' operator to show how the search doesn't just stop
// once a match is detected. It continues extending the match. Using
// '[a-z]+?' would also work as expected and stop the search early.
// Greediness is built into the automaton.
let dfa = DFAnew?;
let mut cache = dfa.create_cache;
let haystack = "123 foobar 4567".as_bytes;
let mat = find_leftmost_first?.unwrap;
assert_eq!;
assert_eq!;
// Here's another example that tests our handling of the special
// EOI transition. This will fail to find a match if we don't call
// 'next_eoi_state' at the end of the search since the match isn't found
// until the final byte in the haystack.
let dfa = DFAnew?;
let mut cache = dfa.create_cache;
let haystack = "123 foobar 4567".as_bytes;
let mat = find_leftmost_first?.unwrap;
assert_eq!;
assert_eq!;
// And note that our search implementation above automatically works
// with multi-DFAs. Namely, `dfa.match_pattern(match_state, 0)` selects
// the appropriate pattern ID for us.
let dfa = DFAnew_many?;
let mut cache = dfa.create_cache;
let haystack = "123 foobar 4567".as_bytes;
let mat = find_leftmost_first?.unwrap;
assert_eq!;
assert_eq!;
let mat = find_leftmost_first?.unwrap;
assert_eq!;
assert_eq!;
let mat = find_leftmost_first?.unwrap;
assert_eq!;
assert_eq!;
# Ok::
Implementations
impl LazyStateID
const fn is_tagged(self: &Self) -> boolReturn true if and only if this lazy state ID is tagged.
When a lazy state ID is tagged, then one can conclude that it is one of a match, start, dead, quit or unknown state.
const fn is_unknown(self: &Self) -> boolReturn true if and only if this represents a lazy state ID that is "unknown." That is, the state has not yet been created. When a caller sees this state ID, it generally means that a state has to be computed in order to proceed.
const fn is_dead(self: &Self) -> boolReturn true if and only if this represents a dead state. A dead state is a state that can never transition to any other state except the dead state. When a dead state is seen, it generally indicates that a search should stop.
const fn is_quit(self: &Self) -> boolReturn true if and only if this represents a quit state. A quit state is a state that is representationally equivalent to a dead state, except it indicates the automaton has reached a point at which it can no longer determine whether a match exists or not. In general, this indicates an error during search and the caller must either pass this error up or use a different search technique.
const fn is_start(self: &Self) -> boolReturn true if and only if this lazy state ID has been tagged as a start state.
Note that if
Config::specialize_start_statesis disabled (which is the default), then this will always return false since start states won't be tagged.const fn is_match(self: &Self) -> boolReturn true if and only if this lazy state ID has been tagged as a match state.
impl Clone for LazyStateID
fn clone(self: &Self) -> LazyStateID
impl Copy for LazyStateID
impl Debug for LazyStateID
fn fmt(self: &Self, f: &mut Formatter<'_>) -> Result
impl Default for LazyStateID
fn default() -> LazyStateID
impl Eq for LazyStateID
impl Freeze for LazyStateID
impl Hash for LazyStateID
fn hash<__H: $crate::hash::Hasher>(self: &Self, state: &mut __H)
impl Ord for LazyStateID
fn cmp(self: &Self, other: &LazyStateID) -> Ordering
impl PartialEq for LazyStateID
fn eq(self: &Self, other: &LazyStateID) -> bool
impl PartialOrd for LazyStateID
fn partial_cmp(self: &Self, other: &LazyStateID) -> Option<Ordering>
impl RefUnwindSafe for LazyStateID
impl Send for LazyStateID
impl StructuralPartialEq for LazyStateID
impl Sync for LazyStateID
impl Unpin for LazyStateID
impl UnsafeUnpin for LazyStateID
impl UnwindSafe for LazyStateID
impl<T> Any for LazyStateID
fn type_id(self: &Self) -> TypeId
impl<T> Borrow for LazyStateID
fn borrow(self: &Self) -> &T
impl<T> BorrowMut for LazyStateID
fn borrow_mut(self: &mut Self) -> &mut T
impl<T> CloneToUninit for LazyStateID
unsafe fn clone_to_uninit(self: &Self, dest: *mut u8)
impl<T> From for LazyStateID
fn from(t: T) -> TReturns the argument unchanged.
impl<T> ToOwned for LazyStateID
fn to_owned(self: &Self) -> Tfn clone_into(self: &Self, target: &mut T)
impl<T, U> Into for LazyStateID
fn into(self: Self) -> UCalls
U::from(self).That is, this conversion is whatever the implementation of
[From]<T> for Uchooses to do.
impl<T, U> TryFrom for LazyStateID
fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>
impl<T, U> TryInto for LazyStateID
fn try_into(self: Self) -> Result<U, <U as TryFrom<T>>::Error>