Struct Unit
struct Unit(_)
Unit represents a single unit of haystack for DFA based regex engines.
It is not expected for consumers of this crate to need to use this type unless they are implementing their own DFA. And even then, it's not required: implementors may use other techniques to handle haystack units.
Typically, a single unit of haystack for a DFA would be a single byte.
However, for the DFAs in this crate, matches are delayed by a single byte
in order to handle look-ahead assertions (\b, $ and \z). Thus, once
we have consumed the haystack, we must run the DFA through one additional
transition using a unit that indicates the haystack has ended.
There is no way to represent a sentinel with a u8 since all possible
values may be valid haystack units to a DFA, therefore this type
explicitly adds room for a sentinel value.
The sentinel EOI value is always its own equivalence class and is
ultimately represented by adding 1 to the maximum equivalence class value.
So for example, the regex ^[a-z]+$ might be split into the following
equivalence classes:
0 => [\x00-`]
1 => [a-z]
2 => [{-\xFF]
3 => [EOI]
Where EOI is the special sentinel value that is always in its own singleton equivalence class.
Implementations
impl Unit
fn u8(byte: u8) -> UnitCreate a new haystack unit from a byte value.
All possible byte values are legal. However, when creating a haystack unit for a specific DFA, one should be careful to only construct units that are in that DFA's alphabet. Namely, one way to compact a DFA's in-memory representation is to collapse its transitions to a set of equivalence classes into a set of all possible byte values. If a DFA uses equivalence classes instead of byte values, then the byte given here should be the equivalence class.
fn eoi(num_byte_equiv_classes: usize) -> UnitCreate a new "end of input" haystack unit.
The value given is the sentinel value used by this unit to represent the "end of input." The value should be the total number of equivalence classes in the corresponding alphabet. Its maximum value is
256, which occurs when every byte is its own equivalence class.Panics
This panics when
num_byte_equiv_classesis greater than256.fn as_u8(self: Self) -> Option<u8>If this unit is not an "end of input" sentinel, then returns its underlying byte value. Otherwise return
None.fn as_eoi(self: Self) -> Option<u16>If this unit is an "end of input" sentinel, then return the underlying sentinel value that was given to
Unit::eoi. Otherwise returnNone.fn as_usize(self: Self) -> usizeReturn this unit as a
usize, regardless of whether it is a byte value or an "end of input" sentinel. In the latter case, the underlying sentinel value given toUnit::eoiis returned.fn is_byte(self: Self, byte: u8) -> boolReturns true if and only of this unit is a byte value equivalent to the byte given. This always returns false when this is an "end of input" sentinel.
fn is_eoi(self: Self) -> boolReturns true when this unit represents an "end of input" sentinel.
fn is_word_byte(self: Self) -> boolReturns true when this unit corresponds to an ASCII word byte.
This always returns false when this unit represents an "end of input" sentinel.
impl Clone for Unit
fn clone(self: &Self) -> Unit
impl Copy for Unit
impl Debug for Unit
fn fmt(self: &Self, f: &mut Formatter<'_>) -> Result
impl Eq for Unit
impl Freeze for Unit
impl Ord for Unit
fn cmp(self: &Self, other: &Unit) -> Ordering
impl PartialEq for Unit
fn eq(self: &Self, other: &Unit) -> bool
impl PartialOrd for Unit
fn partial_cmp(self: &Self, other: &Unit) -> Option<Ordering>
impl RefUnwindSafe for Unit
impl Send for Unit
impl StructuralPartialEq for Unit
impl Sync for Unit
impl Unpin for Unit
impl UnsafeUnpin for Unit
impl UnwindSafe for Unit
impl<T> Any for Unit
fn type_id(self: &Self) -> TypeId
impl<T> Borrow for Unit
fn borrow(self: &Self) -> &T
impl<T> BorrowMut for Unit
fn borrow_mut(self: &mut Self) -> &mut T
impl<T> CloneToUninit for Unit
unsafe fn clone_to_uninit(self: &Self, dest: *mut u8)
impl<T> From for Unit
fn from(t: T) -> TReturns the argument unchanged.
impl<T> ToOwned for Unit
fn to_owned(self: &Self) -> Tfn clone_into(self: &Self, target: &mut T)
impl<T, U> Into for Unit
fn into(self: Self) -> UCalls
U::from(self).That is, this conversion is whatever the implementation of
[From]<T> for Uchooses to do.
impl<T, U> TryFrom for Unit
fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>
impl<T, U> TryInto for Unit
fn try_into(self: Self) -> Result<U, <U as TryFrom<T>>::Error>