Enum State

enum State

A state in an NFA.

In theory, it can help to conceptualize an NFA as a graph consisting of States. Each State contains its complete set of outgoing transitions.

In practice, it can help to conceptualize an NFA as a sequence of instructions for a virtual machine. Each State says what to do and where to go next.

Strictly speaking, the practical interpretation is the most correct one, because of the Capture state. Namely, a Capture state always forwards execution to another state unconditionally. Its only purpose is to cause a side effect: the recording of the current input position at a particular location in memory. In this sense, an NFA has more power than a theoretical non-deterministic finite automaton.

For most uses of this crate, it is likely that one may never even need to be aware of this type at all. The main use cases for looking at States directly are if you need to write your own search implementation or if you need to do some kind of analysis on the NFA.

Variants

ByteRange { trans: Transition }

A state with a single transition that can only be taken if the current input symbol is in a particular range of bytes.

Sparse(SparseTransitions)

A state with possibly many transitions represented in a sparse fashion. Transitions are non-overlapping and ordered lexicographically by input range.

In practice, this is used for encoding UTF-8 automata. Its presence is primarily an optimization that avoids many additional unconditional epsilon transitions (via Union states), and thus decreases the overhead of traversing the NFA. This can improve both matching time and DFA construction time.

Dense(DenseTransitions)

A dense representation of a state with multiple transitions.

Look { look: crate::util::look::Look, next: crate::util::primitives::StateID }

A conditional epsilon transition satisfied via some sort of look-around. Look-around is limited to anchor and word boundary assertions.

Look-around states are meant to be evaluated while performing epsilon closure (computing the set of states reachable from a particular state via only epsilon transitions). If the current position in the haystack satisfies the look-around assertion, then you're permitted to follow that epsilon transition.

Union { alternates: alloc::boxed::Box<[crate::util::primitives::StateID]> }

An alternation such that there exists an epsilon transition to all states in alternates, where matches found via earlier transitions are preferred over later transitions.

BinaryUnion { alt1: crate::util::primitives::StateID, alt2: crate::util::primitives::StateID }

An alternation such that there exists precisely two unconditional epsilon transitions, where matches found via alt1 are preferred over matches found via alt2.

This state exists as a common special case of Union where there are only two alternates. In this case, we don't need any allocations to represent the state. This saves a bit of memory and also saves an additional memory access when traversing the NFA.

Capture { next: crate::util::primitives::StateID, pattern_id: crate::util::primitives::PatternID, group_index: crate::util::primitives::SmallIndex, slot: crate::util::primitives::SmallIndex }

An empty state that records a capture location.

From the perspective of finite automata, this is precisely equivalent to an unconditional epsilon transition, but serves the purpose of instructing NFA simulations to record additional state when the finite state machine passes through this epsilon transition.

slot in this context refers to the specific capture group slot offset that is being recorded. Each capturing group has two slots corresponding to the start and end of the matching portion of that group.

The pattern ID and capture group index are also included in this state in case they are useful. But mostly, all you'll need is next and slot.

Fail

A state that cannot be transitioned out of. This is useful for cases where you want to prevent matching from occurring. For example, if your regex parser permits empty character classes, then one could choose a Fail state to represent them. (An empty character class can be thought of as an empty set. Since nothing is in an empty set, they can never match anything.)

Match { pattern_id: crate::util::primitives::PatternID }

A match state. There is at least one such occurrence of this state for each regex that can match that is in this NFA.

Implementations

impl State

fn is_epsilon(self: &Self) -> bool

Returns true if and only if this state contains one or more epsilon transitions.

In practice, a state has no outgoing transitions (like Match), has only non-epsilon transitions (like ByteRange) or has only epsilon transitions (like Union).

Example

use regex_automata::{
    nfa::thompson::{State, Transition},
    util::primitives::{PatternID, StateID, SmallIndex},
};

// Capture states are epsilon transitions.
let state = State::Capture {
    next: StateID::ZERO,
    pattern_id: PatternID::ZERO,
    group_index: SmallIndex::ZERO,
    slot: SmallIndex::ZERO,
};
assert!(state.is_epsilon());

// ByteRange states are not.
let state = State::ByteRange {
    trans: Transition { start: b'a', end: b'z', next: StateID::ZERO },
};
assert!(!state.is_epsilon());

# Ok::<(), Box<dyn std::error::Error>>(())

impl Clone for State

fn clone(self: &Self) -> State

impl Debug for State

fn fmt(self: &Self, f: &mut Formatter<'_>) -> Result

impl Eq for State

impl Freeze for State

impl PartialEq for State

fn eq(self: &Self, other: &State) -> bool

impl RefUnwindSafe for State

impl Send for State

impl StructuralPartialEq for State

impl Sync for State

impl Unpin for State

impl UnsafeUnpin for State

impl UnwindSafe for State

impl<T> Any for State

fn type_id(self: &Self) -> TypeId

impl<T> Borrow for State

fn borrow(self: &Self) -> &T

impl<T> BorrowMut for State

fn borrow_mut(self: &mut Self) -> &mut T

impl<T> CloneToUninit for State

unsafe fn clone_to_uninit(self: &Self, dest: *mut u8)

impl<T> From for State

fn from(t: T) -> T

Returns the argument unchanged.

impl<T> ToOwned for State

fn to_owned(self: &Self) -> T
fn clone_into(self: &Self, target: &mut T)

impl<T, U> Into for State

fn into(self: Self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of [From]<T> for U chooses to do.

impl<T, U> TryFrom for State

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto for State

fn try_into(self: Self) -> Result<U, <U as TryFrom<T>>::Error>