Enum Class
enum Class
The high-level intermediate representation of a character class.
A character class corresponds to a set of characters. A character is either defined by a Unicode scalar value or a byte.
A character class, regardless of its character type, is represented by a sequence of non-overlapping non-adjacent ranges of characters.
There are no guarantees about which class variant is used. Generally
speaking, the Unicode variat is used whenever a class needs to contain
non-ASCII Unicode scalar values. But the Unicode variant can be used even
when Unicode mode is disabled. For example, at the time of writing, the
regex (?-u:a|\xc2\xa0) will compile down to HIR for the Unicode class
[a\u00A0] due to optimizations.
Note that Bytes variant may be produced even when it exclusively matches
valid UTF-8. This is because a Bytes variant represents an intention by
the author of the regular expression to disable Unicode mode, which in turn
impacts the semantics of case insensitive matching. For example, (?i)k
and (?i-u)k will not match the same set of strings.
Variants
-
Unicode(ClassUnicode) A set of characters represented by Unicode scalar values.
-
Bytes(ClassBytes) A set of characters represented by arbitrary bytes (one byte per character).
Implementations
impl Class
fn case_fold_simple(self: &mut Self)Apply Unicode simple case folding to this character class, in place. The character class will be expanded to include all simple case folded character variants.
If this is a byte oriented character class, then this will be limited to the ASCII ranges
A-Zanda-z.Panics
This routine panics when the case mapping data necessary for this routine to complete is unavailable. This occurs when the
unicode-casefeature is not enabled and the underlying class is Unicode oriented.Callers should prefer using
try_case_fold_simpleinstead, which will return an error instead of panicking.fn try_case_fold_simple(self: &mut Self) -> Result<(), CaseFoldError>Apply Unicode simple case folding to this character class, in place. The character class will be expanded to include all simple case folded character variants.
If this is a byte oriented character class, then this will be limited to the ASCII ranges
A-Zanda-z.Error
This routine returns an error when the case mapping data necessary for this routine to complete is unavailable. This occurs when the
unicode-casefeature is not enabled and the underlying class is Unicode oriented.fn negate(self: &mut Self)Negate this character class in place.
After completion, this character class will contain precisely the characters that weren't previously in the class.
fn is_utf8(self: &Self) -> boolReturns true if and only if this character class will only ever match valid UTF-8.
A character class can match invalid UTF-8 only when the following conditions are met:
- The translator was configured to permit generating an expression that can match invalid UTF-8. (By default, this is disabled.)
- Unicode mode (via the
uflag) was disabled either in the concrete syntax or in the parser builder. By default, Unicode mode is enabled.
fn minimum_len(self: &Self) -> Option<usize>Returns the length, in bytes, of the smallest string matched by this character class.
For non-empty byte oriented classes, this always returns
1. For non-empty Unicode oriented classes, this can return1,2,3or4. For empty classes,Noneis returned. It is impossible for0to be returned.Example
This example shows some examples of regexes and their corresponding minimum length, if any.
use ; // The empty string has a min length of 0. let hir = parse?; assert_eq!; // As do other types of regexes that only match the empty string. let hir = parse?; assert_eq!; // A regex that can match the empty string but match more is still 0. let hir = parse?; assert_eq!; // A regex that matches nothing has no minimum defined. let hir = parse?; assert_eq!; // Character classes usually have a minimum length of 1. let hir = parse?; assert_eq!; // But sometimes Unicode classes might be bigger! let hir = parse?; assert_eq!; # Ok::fn maximum_len(self: &Self) -> Option<usize>Returns the length, in bytes, of the longest string matched by this character class.
For non-empty byte oriented classes, this always returns
1. For non-empty Unicode oriented classes, this can return1,2,3or4. For empty classes,Noneis returned. It is impossible for0to be returned.Example
This example shows some examples of regexes and their corresponding maximum length, if any.
use ; // The empty string has a max length of 0. let hir = parse?; assert_eq!; // As do other types of regexes that only match the empty string. let hir = parse?; assert_eq!; // A regex that matches nothing has no maximum defined. let hir = parse?; assert_eq!; // Bounded repeats work as you expect. let hir = parse?; assert_eq!; // An unbounded repeat means there is no maximum. let hir = parse?; assert_eq!; // With Unicode enabled, \w can match up to 4 bytes! let hir = parse?; assert_eq!; // Without Unicode enabled, \w matches at most 1 byte. let hir = parse?; assert_eq!; # Ok::fn is_empty(self: &Self) -> boolReturns true if and only if this character class is empty. That is, it has no elements.
An empty character can never match anything, including an empty string.
fn literal(self: &Self) -> Option<Vec<u8>>If this class consists of exactly one element (whether a codepoint or a byte), then return it as a literal byte string.
If this class is empty or contains more than one element, then
Noneis returned.
impl Clone for Class
fn clone(self: &Self) -> Class
impl Debug for Class
fn fmt(self: &Self, f: &mut Formatter<'_>) -> Result
impl Eq for Class
impl Freeze for Class
impl PartialEq for Class
fn eq(self: &Self, other: &Class) -> bool
impl RefUnwindSafe for Class
impl Send for Class
impl StructuralPartialEq for Class
impl Sync for Class
impl Unpin for Class
impl UnsafeUnpin for Class
impl UnwindSafe for Class
impl<T> Any for Class
fn type_id(self: &Self) -> TypeId
impl<T> Borrow for Class
fn borrow(self: &Self) -> &T
impl<T> BorrowMut for Class
fn borrow_mut(self: &mut Self) -> &mut T
impl<T> CloneToUninit for Class
unsafe fn clone_to_uninit(self: &Self, dest: *mut u8)
impl<T> From for Class
fn from(t: T) -> TReturns the argument unchanged.
impl<T> ToOwned for Class
fn to_owned(self: &Self) -> Tfn clone_into(self: &Self, target: &mut T)
impl<T, U> Into for Class
fn into(self: Self) -> UCalls
U::from(self).That is, this conversion is whatever the implementation of
[From]<T> for Uchooses to do.
impl<T, U> TryFrom for Class
fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>
impl<T, U> TryInto for Class
fn try_into(self: Self) -> Result<U, <U as TryFrom<T>>::Error>