Struct Config

struct Config { ... }

The configuration used for building a one-pass DFA.

A one-pass DFA configuration is a simple data object that is typically used with Builder::configure. It can be cheaply cloned.

A default configuration can be created either with Config::new, or perhaps more conveniently, with DFA::config.

Implementations

impl Config

fn new() -> Config

Return a new default one-pass DFA configuration.

fn match_kind(self: Self, kind: MatchKind) -> Config

Set the desired match semantics.

The default is MatchKind::LeftmostFirst, which corresponds to the match semantics of Perl-like regex engines. That is, when multiple patterns would match at the same leftmost position, the pattern that appears first in the concrete syntax is chosen.

Currently, the only other kind of match semantics supported is MatchKind::All. This corresponds to "classical DFA" construction where all possible matches are visited.

When it comes to the one-pass DFA, it is rarer for preference order and "longest match" to actually disagree. Since if they did disagree, then the regex typically isn't one-pass. For example, searching Samwise for Sam|Samwise will report Sam for leftmost-first matching and Samwise for "longest match" or "all" matching. However, this regex is not one-pass if taken literally. The equivalent regex, Sam(?:|wise) is one-pass and Sam|Samwise may be optimized to it.

The other main difference is that "all" match semantics don't support non-greedy matches. "All" match semantics always try to match as much as possible.

fn starts_for_each_pattern(self: Self, yes: bool) -> Config

Whether to compile a separate start state for each pattern in the one-pass DFA.

When enabled, a separate anchored start state is added for each pattern in the DFA. When this start state is used, then the DFA will only search for matches for the pattern specified, even if there are other patterns in the DFA.

The main downside of this option is that it can potentially increase the size of the DFA and/or increase the time it takes to build the DFA.

You might want to enable this option when you want to both search for anchored matches of any pattern or to search for anchored matches of one particular pattern while using the same DFA. (Otherwise, you would need to compile a new DFA for each pattern.)

By default this is disabled.

Example

This example shows how to build a multi-regex and then search for matches for a any of the patterns or matches for a specific pattern.

use regex_automata::{
    dfa::onepass::DFA, Anchored, Input, Match, PatternID,
};

let re = DFA::builder()
    .configure(DFA::config().starts_for_each_pattern(true))
    .build_many(&["[a-z]+", "[0-9]+"])?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
let haystack = "123abc";
let input = Input::new(haystack).anchored(Anchored::Yes);

// A normal multi-pattern search will show pattern 1 matches.
re.try_search(&mut cache, &input, &mut caps)?;
assert_eq!(Some(Match::must(1, 0..3)), caps.get_match());

// If we only want to report pattern 0 matches, then we'll get no
// match here.
let input = input.anchored(Anchored::Pattern(PatternID::must(0)));
re.try_search(&mut cache, &input, &mut caps)?;
assert_eq!(None, caps.get_match());

# Ok::<(), Box<dyn std::error::Error>>(())
fn byte_classes(self: Self, yes: bool) -> Config

Whether to attempt to shrink the size of the DFA's alphabet or not.

This option is enabled by default and should never be disabled unless one is debugging a one-pass DFA.

When enabled, the DFA will use a map from all possible bytes to their corresponding equivalence class. Each equivalence class represents a set of bytes that does not discriminate between a match and a non-match in the DFA. For example, the pattern [ab]+ has at least two equivalence classes: a set containing a and b and a set containing every byte except for a and b. a and b are in the same equivalence class because they never discriminate between a match and a non-match.

The advantage of this map is that the size of the transition table can be reduced drastically from (approximately) #states * 256 * sizeof(StateID) to #states * k * sizeof(StateID) where k is the number of equivalence classes (rounded up to the nearest power of 2). As a result, total space usage can decrease substantially. Moreover, since a smaller alphabet is used, DFA compilation becomes faster as well.

WARNING: This is only useful for debugging DFAs. Disabling this does not yield any speed advantages. Namely, even when this is disabled, a byte class map is still used while searching. The only difference is that every byte will be forced into its own distinct equivalence class. This is useful for debugging the actual generated transitions because it lets one see the transitions defined on actual bytes instead of the equivalence classes.

fn size_limit(self: Self, limit: Option<usize>) -> Config

Set a size limit on the total heap used by a one-pass DFA.

This size limit is expressed in bytes and is applied during construction of a one-pass DFA. If the DFA's heap usage exceeds this configured limit, then construction is stopped and an error is returned.

The default is no limit.

Example

This example shows a one-pass DFA that fails to build because of a configured size limit. This particular example also serves as a cautionary tale demonstrating just how big DFAs with large Unicode character classes can get.

# if cfg!(miri) { return Ok(()); } // miri takes too long
use regex_automata::{dfa::onepass::DFA, Match};

// 6MB isn't enough!
DFA::builder()
    .configure(DFA::config().size_limit(Some(6_000_000)))
    .build(r"\w{20}")
    .unwrap_err();

// ... but 7MB probably is!
// (Note that DFA sizes aren't necessarily stable between releases.)
let re = DFA::builder()
    .configure(DFA::config().size_limit(Some(7_000_000)))
    .build(r"\w{20}")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
let haystack = "A".repeat(20);
re.captures(&mut cache, &haystack, &mut caps);
assert_eq!(Some(Match::must(0, 0..20)), caps.get_match());

# Ok::<(), Box<dyn std::error::Error>>(())

While one needs a little more than 3MB to represent \w{20}, it turns out that you only need a little more than 4KB to represent (?-u:\w{20}). So only use Unicode if you need it!

fn get_match_kind(self: &Self) -> MatchKind

Returns the match semantics set in this configuration.

fn get_starts_for_each_pattern(self: &Self) -> bool

Returns whether this configuration has enabled anchored starting states for every pattern in the DFA.

fn get_byte_classes(self: &Self) -> bool

Returns whether this configuration has enabled byte classes or not. This is typically a debugging oriented option, as disabling it confers no speed benefit.

fn get_size_limit(self: &Self) -> Option<usize>

Returns the DFA size limit of this configuration if one was set. The size limit is total number of bytes on the heap that a DFA is permitted to use. If the DFA exceeds this limit during construction, then construction is stopped and an error is returned.

impl Clone for Config

fn clone(self: &Self) -> Config

impl Debug for Config

fn fmt(self: &Self, f: &mut Formatter<'_>) -> Result

impl Default for Config

fn default() -> Config

impl Freeze for Config

impl RefUnwindSafe for Config

impl Send for Config

impl Sync for Config

impl Unpin for Config

impl UnsafeUnpin for Config

impl UnwindSafe for Config

impl<T> Any for Config

fn type_id(self: &Self) -> TypeId

impl<T> Borrow for Config

fn borrow(self: &Self) -> &T

impl<T> BorrowMut for Config

fn borrow_mut(self: &mut Self) -> &mut T

impl<T> CloneToUninit for Config

unsafe fn clone_to_uninit(self: &Self, dest: *mut u8)

impl<T> From for Config

fn from(t: T) -> T

Returns the argument unchanged.

impl<T> ToOwned for Config

fn to_owned(self: &Self) -> T
fn clone_into(self: &Self, target: &mut T)

impl<T, U> Into for Config

fn into(self: Self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of [From]<T> for U chooses to do.

impl<T, U> TryFrom for Config

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto for Config

fn try_into(self: Self) -> Result<U, <U as TryFrom<T>>::Error>