Enum State
enum State
A state in an NFA.
In theory, it can help to conceptualize an NFA as a graph consisting of
States. Each State contains its complete set of outgoing transitions.
In practice, it can help to conceptualize an NFA as a sequence of
instructions for a virtual machine. Each State says what to do and where
to go next.
Strictly speaking, the practical interpretation is the most correct one,
because of the Capture state. Namely, a Capture
state always forwards execution to another state unconditionally. Its only
purpose is to cause a side effect: the recording of the current input
position at a particular location in memory. In this sense, an NFA
has more power than a theoretical non-deterministic finite automaton.
For most uses of this crate, it is likely that one may never even need to
be aware of this type at all. The main use cases for looking at States
directly are if you need to write your own search implementation or if you
need to do some kind of analysis on the NFA.
Variants
-
ByteRange { trans: Transition } A state with a single transition that can only be taken if the current input symbol is in a particular range of bytes.
-
Sparse(SparseTransitions) A state with possibly many transitions represented in a sparse fashion. Transitions are non-overlapping and ordered lexicographically by input range.
In practice, this is used for encoding UTF-8 automata. Its presence is primarily an optimization that avoids many additional unconditional epsilon transitions (via
Unionstates), and thus decreases the overhead of traversing the NFA. This can improve both matching time and DFA construction time.-
Dense(DenseTransitions) A dense representation of a state with multiple transitions.
-
Look { look: crate::util::look::Look, next: crate::util::primitives::StateID } A conditional epsilon transition satisfied via some sort of look-around. Look-around is limited to anchor and word boundary assertions.
Look-around states are meant to be evaluated while performing epsilon closure (computing the set of states reachable from a particular state via only epsilon transitions). If the current position in the haystack satisfies the look-around assertion, then you're permitted to follow that epsilon transition.
-
Union { alternates: alloc::boxed::Box<[crate::util::primitives::StateID]> } An alternation such that there exists an epsilon transition to all states in
alternates, where matches found via earlier transitions are preferred over later transitions.-
BinaryUnion { alt1: crate::util::primitives::StateID, alt2: crate::util::primitives::StateID } An alternation such that there exists precisely two unconditional epsilon transitions, where matches found via
alt1are preferred over matches found viaalt2.This state exists as a common special case of Union where there are only two alternates. In this case, we don't need any allocations to represent the state. This saves a bit of memory and also saves an additional memory access when traversing the NFA.
-
Capture { next: crate::util::primitives::StateID, pattern_id: crate::util::primitives::PatternID, group_index: crate::util::primitives::SmallIndex, slot: crate::util::primitives::SmallIndex } An empty state that records a capture location.
From the perspective of finite automata, this is precisely equivalent to an unconditional epsilon transition, but serves the purpose of instructing NFA simulations to record additional state when the finite state machine passes through this epsilon transition.
slotin this context refers to the specific capture group slot offset that is being recorded. Each capturing group has two slots corresponding to the start and end of the matching portion of that group.The pattern ID and capture group index are also included in this state in case they are useful. But mostly, all you'll need is
nextandslot.-
Fail A state that cannot be transitioned out of. This is useful for cases where you want to prevent matching from occurring. For example, if your regex parser permits empty character classes, then one could choose a
Failstate to represent them. (An empty character class can be thought of as an empty set. Since nothing is in an empty set, they can never match anything.)-
Match { pattern_id: crate::util::primitives::PatternID } A match state. There is at least one such occurrence of this state for each regex that can match that is in this NFA.
Implementations
impl State
fn is_epsilon(self: &Self) -> boolReturns true if and only if this state contains one or more epsilon transitions.
In practice, a state has no outgoing transitions (like
Match), has only non-epsilon transitions (likeByteRange) or has only epsilon transitions (likeUnion).Example
use ; // Capture states are epsilon transitions. let state = Capture ; assert!; // ByteRange states are not. let state = ByteRange ; assert!; # Ok::
impl Clone for State
fn clone(self: &Self) -> State
impl Debug for State
fn fmt(self: &Self, f: &mut Formatter<'_>) -> Result
impl Eq for State
impl Freeze for State
impl PartialEq for State
fn eq(self: &Self, other: &State) -> bool
impl RefUnwindSafe for State
impl Send for State
impl StructuralPartialEq for State
impl Sync for State
impl Unpin for State
impl UnsafeUnpin for State
impl UnwindSafe for State
impl<T> Any for State
fn type_id(self: &Self) -> TypeId
impl<T> Borrow for State
fn borrow(self: &Self) -> &T
impl<T> BorrowMut for State
fn borrow_mut(self: &mut Self) -> &mut T
impl<T> CloneToUninit for State
unsafe fn clone_to_uninit(self: &Self, dest: *mut u8)
impl<T> From for State
fn from(t: T) -> TReturns the argument unchanged.
impl<T> ToOwned for State
fn to_owned(self: &Self) -> Tfn clone_into(self: &Self, target: &mut T)
impl<T, U> Into for State
fn into(self: Self) -> UCalls
U::from(self).That is, this conversion is whatever the implementation of
[From]<T> for Uchooses to do.
impl<T, U> TryFrom for State
fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>
impl<T, U> TryInto for State
fn try_into(self: Self) -> Result<U, <U as TryFrom<T>>::Error>