Struct Seq
struct Seq { ... }
A sequence of literals.
A Seq is very much like a set in that it represents a union of its
members. That is, it corresponds to a set of literals where at least one
must match in order for a particular Hir expression to match. (Whether
this corresponds to the entire Hir expression, a prefix of it or a suffix
of it depends on how the Seq was extracted from the Hir.)
It is also unlike a set in that multiple identical literals may appear,
and that the order of the literals in the Seq matters. For example, if
the sequence is [sam, samwise] and leftmost-first matching is used, then
samwise can never match and the sequence is equivalent to [sam].
States of a sequence
A Seq has a few different logical states to consider:
- The sequence can represent "any" literal. When this happens, the set does
not have a finite size. The purpose of this state is to inhibit callers
from making assumptions about what literals are required in order to match
a particular
Hirexpression. Generally speaking, when a set is in this state, literal optimizations are inhibited. A good example of a regex that will cause this sort of set to appear is[A-Za-z]. The character class is just too big (and also too narrow) to be usefully expanded into 52 different literals. (Note that the decision for when a seq should become infinite is determined by the caller. A seq itself has no hard-coded limits.) - The sequence can be empty, in which case, it is an affirmative statement
that there are no literals that can match the corresponding
Hir. Consequently, theHirnever matches any input. For example,[a&&b]. - The sequence can be non-empty, in which case, at least one of the
literals must match in order for the corresponding
Hirto match.
Example
This example shows how literal sequences can be simplified by stripping suffixes and minimizing while maintaining preference order.
use ;
let mut seq = new;
seq.keep_first_bytes;
seq.minimize_by_preference;
// Notice that 'far' comes before 'app', which matches the order in the
// original sequence. This guarantees that leftmost-first semantics are
// not altered by simplifying the set.
let expected = from_iter;
assert_eq!;
Implementations
impl Seq
fn empty() -> SeqReturns an empty sequence.
An empty sequence matches zero literals, and thus corresponds to a regex that itself can never match.
fn infinite() -> SeqReturns a sequence of literals without a finite size and may contain any literal.
A sequence without finite size does not reveal anything about the characteristics of the literals in its set. There are no fixed prefixes or suffixes, nor are lower or upper bounds on the length of the literals in the set known.
This is useful to represent constructs in a regex that are "too big" to useful represent as a sequence of literals. For example,
[A-Za-z]. When sequences get too big, they lose their discriminating nature and are more likely to produce false positives, which in turn makes them less likely to speed up searches.More pragmatically, for many regexes, enumerating all possible literals is itself not possible or might otherwise use too many resources. So constraining the size of sets during extraction is a practical trade off to make.
fn singleton(lit: Literal) -> SeqReturns a sequence containing a single literal.
fn new<I, B>(it: I) -> Seq where I: IntoIterator<Item = B>, B: AsRef<[u8]>Returns a sequence of exact literals from the given byte strings.
fn literals(self: &Self) -> Option<&[Literal]>If this is a finite sequence, return its members as a slice of literals.
The slice returned may be empty, in which case, there are no literals that can match this sequence.
fn push(self: &mut Self, lit: Literal)Push a literal to the end of this sequence.
If this sequence is not finite, then this is a no-op.
Similarly, if the most recently added item of this sequence is equivalent to the literal given, then it is not added. This reflects a
Seq's "set like" behavior, and represents a practical trade off. Namely, there is never any need to have two adjacent and equivalent literals in the same sequence, and it is easy to detect in some cases.fn make_inexact(self: &mut Self)Make all of the literals in this sequence inexact.
This is a no-op if this sequence is not finite.
fn make_infinite(self: &mut Self)Converts this sequence to an infinite sequence.
This is a no-op if the sequence is already infinite.
fn cross_forward(self: &mut Self, other: &mut Seq)Modify this sequence to contain the cross product between it and the sequence given.
The cross product only considers literals in this sequence that are exact. That is, inexact literals are not extended.
The literals are always drained from
other, even if none are used. This permits callers to reuse the sequence allocation elsewhere.If this sequence is infinite, then this is a no-op, regardless of what
othercontains (and in this case, the literals are still drained fromother). Ifotheris infinite and this sequence is finite, then this is a no-op, unless this sequence contains a zero-length literal. In which case, the infiniteness ofotherinfects this sequence, and this sequence is itself made infinite.Like
Seq::union, this may attempt to deduplicate literals. SeeSeq::dedupfor how deduplication deals with exact and inexact literals.Example
This example shows basic usage and how exact and inexact literals interact.
use ; let mut seq1 = from_iter; let mut seq2 = from_iter; seq1.cross_forward; // The literals are pulled out of seq2. assert_eq!; let expected = from_iter; assert_eq!;This example shows the behavior of when
otheris an infinite sequence.use ; let mut seq1 = from_iter; let mut seq2 = infinite; seq1.cross_forward; // When seq2 is infinite, cross product doesn't add anything, but // ensures all members of seq1 are inexact. let expected = from_iter; assert_eq!;This example is like the one above, but shows what happens when this sequence contains an empty string. In this case, an infinite
othersequence infects this sequence (because the empty string means that there are no finite prefixes):use ; let mut seq1 = from_iter; let mut seq2 = infinite; seq1.cross_forward; // seq1 is now infinite! assert!;This example shows the behavior of this sequence is infinite.
use ; let mut seq1 = infinite; let mut seq2 = from_iter; seq1.cross_forward; // seq1 remains unchanged. assert!; // Even though the literals in seq2 weren't used, it was still drained. assert_eq!;fn cross_reverse(self: &mut Self, other: &mut Seq)Modify this sequence to contain the cross product between it and the sequence given, where the sequences are treated as suffixes instead of prefixes. Namely, the sequence
otheris prepended toself(as opposed tootherbeing appended toselfinSeq::cross_forward).The cross product only considers literals in this sequence that are exact. That is, inexact literals are not extended.
The literals are always drained from
other, even if none are used. This permits callers to reuse the sequence allocation elsewhere.If this sequence is infinite, then this is a no-op, regardless of what
othercontains (and in this case, the literals are still drained fromother). Ifotheris infinite and this sequence is finite, then this is a no-op, unless this sequence contains a zero-length literal. In which case, the infiniteness ofotherinfects this sequence, and this sequence is itself made infinite.Like
Seq::union, this may attempt to deduplicate literals. SeeSeq::dedupfor how deduplication deals with exact and inexact literals.Example
This example shows basic usage and how exact and inexact literals interact.
use ; let mut seq1 = from_iter; let mut seq2 = from_iter; seq1.cross_reverse; // The literals are pulled out of seq2. assert_eq!; let expected = from_iter; assert_eq!;This example shows the behavior of when
otheris an infinite sequence.use ; let mut seq1 = from_iter; let mut seq2 = infinite; seq1.cross_reverse; // When seq2 is infinite, cross product doesn't add anything, but // ensures all members of seq1 are inexact. let expected = from_iter; assert_eq!;This example is like the one above, but shows what happens when this sequence contains an empty string. In this case, an infinite
othersequence infects this sequence (because the empty string means that there are no finite suffixes):use ; let mut seq1 = from_iter; let mut seq2 = infinite; seq1.cross_reverse; // seq1 is now infinite! assert!;This example shows the behavior when this sequence is infinite.
use ; let mut seq1 = infinite; let mut seq2 = from_iter; seq1.cross_reverse; // seq1 remains unchanged. assert!; // Even though the literals in seq2 weren't used, it was still drained. assert_eq!;fn union(self: &mut Self, other: &mut Seq)Unions the
othersequence into this one.The literals are always drained out of the given
othersequence, even if they are being unioned into an infinite sequence. This permits the caller to reuse theothersequence in another context.Some literal deduping may be performed. If any deduping happens, any leftmost-first or "preference" order match semantics will be preserved.
Example
This example shows basic usage.
use Seq; let mut seq1 = new; let mut seq2 = new; seq1.union; // The literals are pulled out of seq2. assert_eq!; // Adjacent literals are deduped, but non-adjacent literals may not be. assert_eq!;This example shows that literals are drained from
othereven when they aren't necessarily used.use Seq; let mut seq1 = infinite; // Infinite sequences have no finite length. assert_eq!; let mut seq2 = new; seq1.union; // seq1 is still infinite and seq2 has been drained. assert_eq!; assert_eq!;fn union_into_empty(self: &mut Self, other: &mut Seq)Unions the
othersequence into this one by splice theothersequence at the position of the first zero-length literal.This is useful for preserving preference order semantics when combining two literal sequences. For example, in the regex
(a||f)+foo, the correct preference order prefix sequence is[a, foo, f].The literals are always drained out of the given
othersequence, even if they are being unioned into an infinite sequence. This permits the caller to reuse theothersequence in another context. Note that the literals are drained even if no union is performed as well, i.e., when this sequence does not contain a zero-length literal.Some literal deduping may be performed. If any deduping happens, any leftmost-first or "preference" order match semantics will be preserved.
Example
This example shows basic usage.
use Seq; let mut seq1 = new; let mut seq2 = new; seq1.union_into_empty; // The literals are pulled out of seq2. assert_eq!; // 'foo' gets spliced into seq1 where the first empty string occurs. assert_eq!;This example shows that literals are drained from
othereven when they aren't necessarily used.use Seq; let mut seq1 = new; let mut seq2 = new; seq1.union_into_empty; // seq1 has no zero length literals, so no splicing happens. assert_eq!; // Even though no splicing happens, seq2 is still drained. assert_eq!;fn dedup(self: &mut Self)Deduplicate adjacent equivalent literals in this sequence.
If adjacent literals are equivalent strings but one is exact and the other inexact, the inexact literal is kept and the exact one is removed.
Deduping an infinite sequence is a no-op.
Example
This example shows how literals that are duplicate byte strings but are not equivalent with respect to exactness are resolved.
use ; let mut seq = from_iter; seq.dedup; assert_eq!;fn sort(self: &mut Self)Sorts this sequence of literals lexicographically.
Note that if, before sorting, if a literal that is a prefix of another literal appears after it, then after sorting, the sequence will not represent the same preference order match semantics. For example, sorting the sequence
[samwise, sam]yields the sequence[sam, samwise]. Under preference order semantics, the latter sequence will never matchsamwisewhere as the first sequence can.Example
This example shows basic usage.
use Seq; let mut seq = new; seq.sort; assert_eq!;fn reverse_literals(self: &mut Self)Reverses all of the literals in this sequence.
The order of the sequence itself is preserved.
Example
This example shows basic usage.
use Seq; let mut seq = new; seq.reverse_literals; assert_eq!;fn minimize_by_preference(self: &mut Self)Shrinks this seq to its minimal size while respecting the preference order of its literals.
While this routine will remove duplicate literals from this seq, it will also remove literals that can never match in a leftmost-first or "preference order" search. Similar to
Seq::dedup, if a literal is deduped, then the one that remains is made inexact.This is a no-op on seqs that are empty or not finite.
Example
This example shows the difference between
{sam, samwise}and{samwise, sam}.use ; // If 'sam' comes before 'samwise' and a preference order search is // executed, then 'samwise' can never match. let mut seq = new; seq.minimize_by_preference; assert_eq!; // But if they are reversed, then it's possible for 'samwise' to match // since it is given higher preference. let mut seq = new; seq.minimize_by_preference; assert_eq!;This example shows that if an empty string is in this seq, then anything that comes after it can never match.
use ; // An empty string is a prefix of all strings, so it automatically // inhibits any subsequent strings from matching. let mut seq = new; seq.minimize_by_preference; let expected = from_iter; assert_eq!; // And of course, if it's at the beginning, then it makes it impossible // for anything else to match. let mut seq = new; seq.minimize_by_preference; assert_eq!;fn keep_first_bytes(self: &mut Self, len: usize)Trims all literals in this seq such that only the first
lenbytes remain. If a literal has less than or equal tolenbytes, then it remains unchanged. Otherwise, it is trimmed and made inexact.Example
use ; let mut seq = new; seq.keep_first_bytes; let expected = from_iter; assert_eq!;fn keep_last_bytes(self: &mut Self, len: usize)Trims all literals in this seq such that only the last
lenbytes remain. If a literal has less than or equal tolenbytes, then it remains unchanged. Otherwise, it is trimmed and made inexact.Example
use ; let mut seq = new; seq.keep_last_bytes; let expected = from_iter; assert_eq!;fn is_finite(self: &Self) -> boolReturns true if this sequence is finite.
When false, this sequence is infinite and must be treated as if it contains every possible literal.
fn is_empty(self: &Self) -> boolReturns true if and only if this sequence is finite and empty.
An empty sequence never matches anything. It can only be produced by literal extraction when the corresponding regex itself cannot match.
fn len(self: &Self) -> Option<usize>Returns the number of literals in this sequence if the sequence is finite. If the sequence is infinite, then
Noneis returned.fn is_exact(self: &Self) -> boolReturns true if and only if all literals in this sequence are exact.
This returns false if the sequence is infinite.
fn is_inexact(self: &Self) -> boolReturns true if and only if all literals in this sequence are inexact.
This returns true if the sequence is infinite.
fn max_union_len(self: &Self, other: &Seq) -> Option<usize>Return the maximum length of the sequence that would result from unioning
selfwithother. If either set is infinite, then this returnsNone.fn max_cross_len(self: &Self, other: &Seq) -> Option<usize>Return the maximum length of the sequence that would result from the cross product of
selfwithother. If either set is infinite, then this returnsNone.fn min_literal_len(self: &Self) -> Option<usize>Returns the length of the shortest literal in this sequence.
If the sequence is infinite or empty, then this returns
None.fn max_literal_len(self: &Self) -> Option<usize>Returns the length of the longest literal in this sequence.
If the sequence is infinite or empty, then this returns
None.fn longest_common_prefix(self: &Self) -> Option<&[u8]>Returns the longest common prefix from this seq.
If the seq matches any literal or other contains no literals, then there is no meaningful prefix and this returns
None.Example
This shows some example seqs and their longest common prefix.
use Seq; let seq = new; assert_eq!; let seq = new; assert_eq!; let seq = new; assert_eq!; let seq = new; assert_eq!; let seq = infinite; assert_eq!; let seq = empty; assert_eq!;fn longest_common_suffix(self: &Self) -> Option<&[u8]>Returns the longest common suffix from this seq.
If the seq matches any literal or other contains no literals, then there is no meaningful suffix and this returns
None.Example
This shows some example seqs and their longest common suffix.
use Seq; let seq = new; assert_eq!; let seq = new; assert_eq!; let seq = new; assert_eq!; let seq = new; assert_eq!; let seq = infinite; assert_eq!; let seq = empty; assert_eq!;fn optimize_for_prefix_by_preference(self: &mut Self)Optimizes this seq while treating its literals as prefixes and respecting the preference order of its literals.
The specific way "optimization" works is meant to be an implementation detail, as it essentially represents a set of heuristics. The goal that optimization tries to accomplish is to make the literals in this set reflect inputs that will result in a more effective prefilter. Principally by reducing the false positive rate of candidates found by the literals in this sequence. That is, when a match of a literal is found, we would like it to be a strong predictor of the overall match of the regex. If it isn't, then much time will be spent starting and stopping the prefilter search and attempting to confirm the match only to have it fail.
Some of those heuristics might be:
- Identifying a common prefix from a larger sequence of literals, and shrinking the sequence down to that single common prefix.
- Rejecting the sequence entirely if it is believed to result in very high false positive rate. When this happens, the sequence is made infinite.
- Shrinking the sequence to a smaller number of literals representing prefixes, but not shrinking it so much as to make literals too short. (A sequence with very short literals, of 1 or 2 bytes, will typically result in a higher false positive rate.)
Optimization should only be run once extraction is complete. Namely, optimization may make assumptions that do not compose with other operations in the middle of extraction. For example, optimization will reduce
[E(sam), E(samwise)]to[E(sam)], but such a transformation is only valid if no other extraction will occur. If other extraction may occur, then the correct transformation would be to[I(sam)].The
Seq::optimize_for_suffix_by_preferencedoes the same thing, but for suffixes.Example
This shows how optimization might transform a sequence. Note that the specific behavior is not a documented guarantee. The heuristics used are an implementation detail and may change over time in semver compatible releases.
use ; let mut seq = new; seq.optimize_for_prefix_by_preference; assert_eq!;Example: optimization may make the sequence infinite
If the heuristics deem that the sequence could cause a very high false positive rate, then it may make the sequence infinite, effectively disabling its use as a prefilter.
use ; let mut seq = new; seq.optimize_for_prefix_by_preference; assert!;Do note that just because there is a
" "in the sequence, that doesn't mean the sequence will always be made infinite after it is optimized. Namely, if the sequence is considered exact (any match corresponds to an overall match of the original regex), then any match is an overall match, and so the false positive rate is always0.To demonstrate this, we remove
samwisefrom our sequence. This results in no optimization happening and all literals remain exact. Thus the entire sequence is exact, and it is kept as-is, even though one is an ASCII space:use ; let mut seq = new; seq.optimize_for_prefix_by_preference; assert!;fn optimize_for_suffix_by_preference(self: &mut Self)Optimizes this seq while treating its literals as suffixes and respecting the preference order of its literals.
Optimization should only be run once extraction is complete.
The
Seq::optimize_for_prefix_by_preferencedoes the same thing, but for prefixes. See its documentation for more explanation.
impl Clone for Seq
fn clone(self: &Self) -> Seq
impl Debug for Seq
fn fmt(self: &Self, f: &mut Formatter<'_>) -> Result
impl Eq for Seq
impl Freeze for Seq
impl FromIterator for Seq
fn from_iter<T: IntoIterator<Item = Literal>>(it: T) -> Seq
impl PartialEq for Seq
fn eq(self: &Self, other: &Seq) -> bool
impl RefUnwindSafe for Seq
impl Send for Seq
impl StructuralPartialEq for Seq
impl Sync for Seq
impl Unpin for Seq
impl UnsafeUnpin for Seq
impl UnwindSafe for Seq
impl<T> Any for Seq
fn type_id(self: &Self) -> TypeId
impl<T> Borrow for Seq
fn borrow(self: &Self) -> &T
impl<T> BorrowMut for Seq
fn borrow_mut(self: &mut Self) -> &mut T
impl<T> CloneToUninit for Seq
unsafe fn clone_to_uninit(self: &Self, dest: *mut u8)
impl<T> From for Seq
fn from(t: T) -> TReturns the argument unchanged.
impl<T> ToOwned for Seq
fn to_owned(self: &Self) -> Tfn clone_into(self: &Self, target: &mut T)
impl<T, U> Into for Seq
fn into(self: Self) -> UCalls
U::from(self).That is, this conversion is whatever the implementation of
[From]<T> for Uchooses to do.
impl<T, U> TryFrom for Seq
fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>
impl<T, U> TryInto for Seq
fn try_into(self: Self) -> Result<U, <U as TryFrom<T>>::Error>