Struct Unit

struct Unit(_)

Unit represents a single unit of haystack for DFA based regex engines.

It is not expected for consumers of this crate to need to use this type unless they are implementing their own DFA. And even then, it's not required: implementors may use other techniques to handle haystack units.

Typically, a single unit of haystack for a DFA would be a single byte. However, for the DFAs in this crate, matches are delayed by a single byte in order to handle look-ahead assertions (\b, $ and \z). Thus, once we have consumed the haystack, we must run the DFA through one additional transition using a unit that indicates the haystack has ended.

There is no way to represent a sentinel with a u8 since all possible values may be valid haystack units to a DFA, therefore this type explicitly adds room for a sentinel value.

The sentinel EOI value is always its own equivalence class and is ultimately represented by adding 1 to the maximum equivalence class value. So for example, the regex ^[a-z]+$ might be split into the following equivalence classes:

0 => [\x00-`]
1 => [a-z]
2 => [{-\xFF]
3 => [EOI]

Where EOI is the special sentinel value that is always in its own singleton equivalence class.

Implementations

impl Unit

fn u8(byte: u8) -> Unit

Create a new haystack unit from a byte value.

All possible byte values are legal. However, when creating a haystack unit for a specific DFA, one should be careful to only construct units that are in that DFA's alphabet. Namely, one way to compact a DFA's in-memory representation is to collapse its transitions to a set of equivalence classes into a set of all possible byte values. If a DFA uses equivalence classes instead of byte values, then the byte given here should be the equivalence class.

fn eoi(num_byte_equiv_classes: usize) -> Unit

Create a new "end of input" haystack unit.

The value given is the sentinel value used by this unit to represent the "end of input." The value should be the total number of equivalence classes in the corresponding alphabet. Its maximum value is 256, which occurs when every byte is its own equivalence class.

Panics

This panics when num_byte_equiv_classes is greater than 256.

fn as_u8(self: Self) -> Option<u8>

If this unit is not an "end of input" sentinel, then returns its underlying byte value. Otherwise return None.

fn as_eoi(self: Self) -> Option<u16>

If this unit is an "end of input" sentinel, then return the underlying sentinel value that was given to Unit::eoi. Otherwise return None.

fn as_usize(self: Self) -> usize

Return this unit as a usize, regardless of whether it is a byte value or an "end of input" sentinel. In the latter case, the underlying sentinel value given to Unit::eoi is returned.

fn is_byte(self: Self, byte: u8) -> bool

Returns true if and only of this unit is a byte value equivalent to the byte given. This always returns false when this is an "end of input" sentinel.

fn is_eoi(self: Self) -> bool

Returns true when this unit represents an "end of input" sentinel.

fn is_word_byte(self: Self) -> bool

Returns true when this unit corresponds to an ASCII word byte.

This always returns false when this unit represents an "end of input" sentinel.

impl Clone for Unit

fn clone(self: &Self) -> Unit

impl Copy for Unit

impl Debug for Unit

fn fmt(self: &Self, f: &mut Formatter<'_>) -> Result

impl Eq for Unit

impl Freeze for Unit

impl Ord for Unit

fn cmp(self: &Self, other: &Unit) -> Ordering

impl PartialEq for Unit

fn eq(self: &Self, other: &Unit) -> bool

impl PartialOrd for Unit

fn partial_cmp(self: &Self, other: &Unit) -> Option<Ordering>

impl RefUnwindSafe for Unit

impl Send for Unit

impl StructuralPartialEq for Unit

impl Sync for Unit

impl Unpin for Unit

impl UnsafeUnpin for Unit

impl UnwindSafe for Unit

impl<T> Any for Unit

fn type_id(self: &Self) -> TypeId

impl<T> Borrow for Unit

fn borrow(self: &Self) -> &T

impl<T> BorrowMut for Unit

fn borrow_mut(self: &mut Self) -> &mut T

impl<T> CloneToUninit for Unit

unsafe fn clone_to_uninit(self: &Self, dest: *mut u8)

impl<T> From for Unit

fn from(t: T) -> T

Returns the argument unchanged.

impl<T> ToOwned for Unit

fn to_owned(self: &Self) -> T
fn clone_into(self: &Self, target: &mut T)

impl<T, U> Into for Unit

fn into(self: Self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of [From]<T> for U chooses to do.

impl<T, U> TryFrom for Unit

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto for Unit

fn try_into(self: Self) -> Result<U, <U as TryFrom<T>>::Error>