Struct Prefilter

struct Prefilter { ... }

A prefilter for accelerating regex searches.

If you already have your literals that you want to search with, then the vanilla Prefilter::new constructor is for you. But if you have an Hir value from the regex-syntax crate, then Prefilter::from_hir_prefix might be more convenient. Namely, it uses the regex-syntax::hir::literal module to extract literal prefixes for you, optimize them and then select and build a prefilter matcher.

A prefilter must have zero false negatives. However, by its very nature, it may produce false positives. That is, a prefilter will never skip over a position in the haystack that corresponds to a match of the original regex pattern, but it may produce a match for a position in the haystack that does not correspond to a match of the original regex pattern. If you use either the Prefilter::from_hir_prefix or Prefilter::from_hirs_prefix constructors, then this guarantee is upheld for you automatically. This guarantee is not preserved if you use Prefilter::new though, since it is up to the caller to provide correct literal strings with respect to the original regex pattern.

Cloning

It is an API guarantee that cloning a prefilter is cheap. That is, cloning it will not duplicate whatever heap memory is used to represent the underlying matcher.

Example

This example shows how to attach a Prefilter to the PikeVM in order to accelerate searches.

use regex_automata::{
    nfa::thompson::pikevm::PikeVM,
    util::prefilter::Prefilter,
    Match, MatchKind,
};

let pre = Prefilter::new(MatchKind::LeftmostFirst, &["Bruce "])
    .expect("a prefilter");
let re = PikeVM::builder()
    .configure(PikeVM::config().prefilter(Some(pre)))
    .build(r"Bruce \w+")?;
let mut cache = re.create_cache();
assert_eq!(
    Some(Match::must(0, 6..23)),
    re.find(&mut cache, "Hello Bruce Springsteen!"),
);
# Ok::<(), Box<dyn std::error::Error>>(())

But note that if you get your prefilter incorrect, it could lead to an incorrect result!

use regex_automata::{
    nfa::thompson::pikevm::PikeVM,
    util::prefilter::Prefilter,
    Match, MatchKind,
};

// This prefilter is wrong!
let pre = Prefilter::new(MatchKind::LeftmostFirst, &["Patti "])
    .expect("a prefilter");
let re = PikeVM::builder()
    .configure(PikeVM::config().prefilter(Some(pre)))
    .build(r"Bruce \w+")?;
let mut cache = re.create_cache();
// We find no match even though the regex does match.
assert_eq!(
    None,
    re.find(&mut cache, "Hello Bruce Springsteen!"),
);
# Ok::<(), Box<dyn std::error::Error>>(())

Implementations

impl Prefilter

fn new<B: AsRef<[u8]>>(kind: MatchKind, needles: &[B]) -> Option<Prefilter>

Create a new prefilter from a sequence of needles and a corresponding match semantics.

This may return None for a variety of reasons, for example, if a suitable prefilter could not be constructed. That might occur if they are unavailable (e.g., the perf-literal-substring and perf-literal-multisubstring features aren't enabled), or it might occur because of heuristics or other artifacts of how the prefilter works.

Note that if you have an Hir expression, it may be more convenient to use Prefilter::from_hir_prefix. It will automatically handle the task of extracting prefix literals for you.

Example

This example shows how match semantics can impact the matching algorithm used by the prefilter. For this reason, it is important to ensure that the match semantics given here are consistent with the match semantics intended for the regular expression that the literals were extracted from.

use regex_automata::{
    util::{prefilter::Prefilter, syntax},
    MatchKind, Span,
};

let hay = "Hello samwise";

// With leftmost-first, we find 'samwise' here because it comes
// before 'sam' in the sequence we give it..
let pre = Prefilter::new(MatchKind::LeftmostFirst, &["samwise", "sam"])
    .expect("a prefilter");
assert_eq!(
    Some(Span::from(6..13)),
    pre.find(hay.as_bytes(), Span::from(0..hay.len())),
);
// Still with leftmost-first but with the literals reverse, now 'sam'
// will match instead!
let pre = Prefilter::new(MatchKind::LeftmostFirst, &["sam", "samwise"])
    .expect("a prefilter");
assert_eq!(
    Some(Span::from(6..9)),
    pre.find(hay.as_bytes(), Span::from(0..hay.len())),
);

# Ok::<(), Box<dyn std::error::Error>>(())
fn from_hir_prefix(kind: MatchKind, hir: &Hir) -> Option<Prefilter>

This attempts to extract prefixes from the given Hir expression for the given match semantics, and if possible, builds a prefilter for them.

Example

This example shows how to build a prefilter directly from an Hir expression, and use to find an occurrence of a prefix from the regex pattern.

use regex_automata::{
    util::{prefilter::Prefilter, syntax},
    MatchKind, Span,
};

let hir = syntax::parse(r"(Bruce|Patti) \w+")?;
let pre = Prefilter::from_hir_prefix(MatchKind::LeftmostFirst, &hir)
    .expect("a prefilter");
let hay = "Hello Patti Scialfa!";
assert_eq!(
    Some(Span::from(6..12)),
    pre.find(hay.as_bytes(), Span::from(0..hay.len())),
);

# Ok::<(), Box<dyn std::error::Error>>(())
fn from_hirs_prefix<H: Borrow<Hir>>(kind: MatchKind, hirs: &[H]) -> Option<Prefilter>

This attempts to extract prefixes from the given Hir expressions for the given match semantics, and if possible, builds a prefilter for them.

Note that as of now, prefilters throw away information about which pattern each literal comes from. In other words, when a prefilter finds a match, there's no way to know which pattern (or patterns) it came from. Therefore, in order to confirm a match, you'll have to check all of the patterns by running the full regex engine.

Example

This example shows how to build a prefilter directly from multiple Hir expressions expression, and use it to find an occurrence of a prefix from the regex patterns.

use regex_automata::{
    util::{prefilter::Prefilter, syntax},
    MatchKind, Span,
};

let hirs = syntax::parse_many(&[
    r"(Bruce|Patti) \w+",
    r"Mrs?\. Doubtfire",
])?;
let pre = Prefilter::from_hirs_prefix(MatchKind::LeftmostFirst, &hirs)
    .expect("a prefilter");
let hay = "Hello Mrs. Doubtfire";
assert_eq!(
    Some(Span::from(6..20)),
    pre.find(hay.as_bytes(), Span::from(0..hay.len())),
);

# Ok::<(), Box<dyn std::error::Error>>(())
fn find(self: &Self, haystack: &[u8], span: Span) -> Option<Span>

Run this prefilter on haystack[span.start..end] and return a matching span if one exists.

The span returned is guaranteed to have a start position greater than or equal to the one given, and an end position less than or equal to the one given.

Example

This example shows how to build a prefilter directly from an Hir expression, and use it to find an occurrence of a prefix from the regex pattern.

use regex_automata::{
    util::{prefilter::Prefilter, syntax},
    MatchKind, Span,
};

let hir = syntax::parse(r"Bruce \w+")?;
let pre = Prefilter::from_hir_prefix(MatchKind::LeftmostFirst, &hir)
    .expect("a prefilter");
let hay = "Hello Bruce Springsteen!";
assert_eq!(
    Some(Span::from(6..12)),
    pre.find(hay.as_bytes(), Span::from(0..hay.len())),
);

# Ok::<(), Box<dyn std::error::Error>>(())
fn prefix(self: &Self, haystack: &[u8], span: Span) -> Option<Span>

Returns the span of a prefix of haystack[span.start..span.end] if the prefilter matches.

The span returned is guaranteed to have a start position equivalent to the one given, and an end position less than or equal to the one given.

Example

This example shows how to build a prefilter directly from an Hir expression, and use it to find an occurrence of a prefix from the regex pattern that begins at the start of a haystack only.

use regex_automata::{
    util::{prefilter::Prefilter, syntax},
    MatchKind, Span,
};

let hir = syntax::parse(r"Bruce \w+")?;
let pre = Prefilter::from_hir_prefix(MatchKind::LeftmostFirst, &hir)
    .expect("a prefilter");
let hay = "Hello Bruce Springsteen!";
// Nothing is found here because 'Bruce' does
// not occur at the beginning of our search.
assert_eq!(
    None,
    pre.prefix(hay.as_bytes(), Span::from(0..hay.len())),
);
// But if we change where we start the search
// to begin where 'Bruce ' begins, then a
// match will be found.
assert_eq!(
    Some(Span::from(6..12)),
    pre.prefix(hay.as_bytes(), Span::from(6..hay.len())),
);

# Ok::<(), Box<dyn std::error::Error>>(())
fn memory_usage(self: &Self) -> usize

Returns the heap memory, in bytes, used by the underlying prefilter.

fn max_needle_len(self: &Self) -> usize

Return the length of the longest needle in this Prefilter

fn is_fast(self: &Self) -> bool

Implementations might return true here if they believe themselves to be "fast." The concept of "fast" is deliberately left vague, but in practice this usually corresponds to whether it's believed that SIMD will be used.

Why do we care about this? Well, some prefilter tricks tend to come with their own bits of overhead, and so might only make sense if we know that a scan will be much faster than the regex engine itself. Otherwise, the trick may not be worth doing. Whether something is "much" faster than the regex engine generally boils down to whether SIMD is used. (But not always. Even a SIMD matcher with a high false positive rate can become quite slow.)

Even if this returns true, it is still possible for the prefilter to be "slow." Remember, prefilters are just heuristics. We can't really know a prefilter will be fast without actually trying the prefilter. (Which of course we cannot afford to do.)

impl Clone for Prefilter

fn clone(self: &Self) -> Prefilter

impl Debug for Prefilter

fn fmt(self: &Self, f: &mut Formatter<'_>) -> Result

impl Freeze for Prefilter

impl RefUnwindSafe for Prefilter

impl Send for Prefilter

impl Sync for Prefilter

impl Unpin for Prefilter

impl UnsafeUnpin for Prefilter

impl UnwindSafe for Prefilter

impl<T> Any for Prefilter

fn type_id(self: &Self) -> TypeId

impl<T> Borrow for Prefilter

fn borrow(self: &Self) -> &T

impl<T> BorrowMut for Prefilter

fn borrow_mut(self: &mut Self) -> &mut T

impl<T> CloneToUninit for Prefilter

unsafe fn clone_to_uninit(self: &Self, dest: *mut u8)

impl<T> From for Prefilter

fn from(t: T) -> T

Returns the argument unchanged.

impl<T> ToOwned for Prefilter

fn to_owned(self: &Self) -> T
fn clone_into(self: &Self, target: &mut T)

impl<T, U> Into for Prefilter

fn into(self: Self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of [From]<T> for U chooses to do.

impl<T, U> TryFrom for Prefilter

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto for Prefilter

fn try_into(self: Self) -> Result<U, <U as TryFrom<T>>::Error>