Struct Config
struct Config { ... }
The configuration used for building a one-pass DFA.
A one-pass DFA configuration is a simple data object that is typically used
with Builder::configure. It can be cheaply cloned.
A default configuration can be created either with Config::new, or
perhaps more conveniently, with DFA::config.
Implementations
impl Config
fn new() -> ConfigReturn a new default one-pass DFA configuration.
fn match_kind(self: Self, kind: MatchKind) -> ConfigSet the desired match semantics.
The default is
MatchKind::LeftmostFirst, which corresponds to the match semantics of Perl-like regex engines. That is, when multiple patterns would match at the same leftmost position, the pattern that appears first in the concrete syntax is chosen.Currently, the only other kind of match semantics supported is
MatchKind::All. This corresponds to "classical DFA" construction where all possible matches are visited.When it comes to the one-pass DFA, it is rarer for preference order and "longest match" to actually disagree. Since if they did disagree, then the regex typically isn't one-pass. For example, searching
SamwiseforSam|Samwisewill reportSamfor leftmost-first matching andSamwisefor "longest match" or "all" matching. However, this regex is not one-pass if taken literally. The equivalent regex,Sam(?:|wise)is one-pass andSam|Samwisemay be optimized to it.The other main difference is that "all" match semantics don't support non-greedy matches. "All" match semantics always try to match as much as possible.
fn starts_for_each_pattern(self: Self, yes: bool) -> ConfigWhether to compile a separate start state for each pattern in the one-pass DFA.
When enabled, a separate anchored start state is added for each pattern in the DFA. When this start state is used, then the DFA will only search for matches for the pattern specified, even if there are other patterns in the DFA.
The main downside of this option is that it can potentially increase the size of the DFA and/or increase the time it takes to build the DFA.
You might want to enable this option when you want to both search for anchored matches of any pattern or to search for anchored matches of one particular pattern while using the same DFA. (Otherwise, you would need to compile a new DFA for each pattern.)
By default this is disabled.
Example
This example shows how to build a multi-regex and then search for matches for a any of the patterns or matches for a specific pattern.
use ; let re = DFAbuilder .configure .build_many?; let = ; let haystack = "123abc"; let input = new.anchored; // A normal multi-pattern search will show pattern 1 matches. re.try_search?; assert_eq!; // If we only want to report pattern 0 matches, then we'll get no // match here. let input = input.anchored; re.try_search?; assert_eq!; # Ok::fn byte_classes(self: Self, yes: bool) -> ConfigWhether to attempt to shrink the size of the DFA's alphabet or not.
This option is enabled by default and should never be disabled unless one is debugging a one-pass DFA.
When enabled, the DFA will use a map from all possible bytes to their corresponding equivalence class. Each equivalence class represents a set of bytes that does not discriminate between a match and a non-match in the DFA. For example, the pattern
[ab]+has at least two equivalence classes: a set containingaandband a set containing every byte except foraandb.aandbare in the same equivalence class because they never discriminate between a match and a non-match.The advantage of this map is that the size of the transition table can be reduced drastically from (approximately)
#states * 256 * sizeof(StateID)to#states * k * sizeof(StateID)wherekis the number of equivalence classes (rounded up to the nearest power of 2). As a result, total space usage can decrease substantially. Moreover, since a smaller alphabet is used, DFA compilation becomes faster as well.WARNING: This is only useful for debugging DFAs. Disabling this does not yield any speed advantages. Namely, even when this is disabled, a byte class map is still used while searching. The only difference is that every byte will be forced into its own distinct equivalence class. This is useful for debugging the actual generated transitions because it lets one see the transitions defined on actual bytes instead of the equivalence classes.
fn size_limit(self: Self, limit: Option<usize>) -> ConfigSet a size limit on the total heap used by a one-pass DFA.
This size limit is expressed in bytes and is applied during construction of a one-pass DFA. If the DFA's heap usage exceeds this configured limit, then construction is stopped and an error is returned.
The default is no limit.
Example
This example shows a one-pass DFA that fails to build because of a configured size limit. This particular example also serves as a cautionary tale demonstrating just how big DFAs with large Unicode character classes can get.
# if cfg! // miri takes too long use ; // 6MB isn't enough! DFAbuilder .configure .build .unwrap_err; // ... but 7MB probably is! // (Note that DFA sizes aren't necessarily stable between releases.) let re = DFAbuilder .configure .build?; let = ; let haystack = "A".repeat; re.captures; assert_eq!; # Ok::While one needs a little more than 3MB to represent
\w{20}, it turns out that you only need a little more than 4KB to represent(?-u:\w{20}). So only use Unicode if you need it!fn get_match_kind(self: &Self) -> MatchKindReturns the match semantics set in this configuration.
fn get_starts_for_each_pattern(self: &Self) -> boolReturns whether this configuration has enabled anchored starting states for every pattern in the DFA.
fn get_byte_classes(self: &Self) -> boolReturns whether this configuration has enabled byte classes or not. This is typically a debugging oriented option, as disabling it confers no speed benefit.
fn get_size_limit(self: &Self) -> Option<usize>Returns the DFA size limit of this configuration if one was set. The size limit is total number of bytes on the heap that a DFA is permitted to use. If the DFA exceeds this limit during construction, then construction is stopped and an error is returned.
impl Clone for Config
fn clone(self: &Self) -> Config
impl Debug for Config
fn fmt(self: &Self, f: &mut Formatter<'_>) -> Result
impl Default for Config
fn default() -> Config
impl Freeze for Config
impl RefUnwindSafe for Config
impl Send for Config
impl Sync for Config
impl Unpin for Config
impl UnsafeUnpin for Config
impl UnwindSafe for Config
impl<T> Any for Config
fn type_id(self: &Self) -> TypeId
impl<T> Borrow for Config
fn borrow(self: &Self) -> &T
impl<T> BorrowMut for Config
fn borrow_mut(self: &mut Self) -> &mut T
impl<T> CloneToUninit for Config
unsafe fn clone_to_uninit(self: &Self, dest: *mut u8)
impl<T> From for Config
fn from(t: T) -> TReturns the argument unchanged.
impl<T> ToOwned for Config
fn to_owned(self: &Self) -> Tfn clone_into(self: &Self, target: &mut T)
impl<T, U> Into for Config
fn into(self: Self) -> UCalls
U::from(self).That is, this conversion is whatever the implementation of
[From]<T> for Uchooses to do.
impl<T, U> TryFrom for Config
fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>
impl<T, U> TryInto for Config
fn try_into(self: Self) -> Result<U, <U as TryFrom<T>>::Error>