Struct Config
struct Config { ... }
An object describing the configuration of a Regex.
This configuration only includes options for the
non-syntax behavior of a Regex, and can be applied via the
Builder::configure method. For configuring the syntax options, see
util::syntax::Config.
Example: lower the NFA size limit
In some cases, the default size limit might be too big. The size limit can be lowered, which will prevent large regex patterns from compiling.
# if cfg! // miri takes too long
use Regex;
let result = builder
.configure
// Not even 20KB is enough to build a single large Unicode class!
.build;
assert!;
# Ok::
Implementations
impl Config
fn new() -> ConfigCreate a new configuration object for a
Regex.fn match_kind(self: Self, kind: MatchKind) -> ConfigSet the match semantics for a
Regex.The default value is
MatchKind::LeftmostFirst.Example
use ; // By default, leftmost-first semantics are used, which // disambiguates matches at the same position by selecting // the one that corresponds earlier in the pattern. let re = new?; assert_eq!; // But with 'all' semantics, match priority is ignored // and all match states are included. When coupled with // a leftmost search, the search will report the last // possible match. let re = builder .configure .build?; assert_eq!; // Beware that this can lead to skipping matches! // Usually 'all' is used for anchored reverse searches // only, or for overlapping searches. assert_eq!; # Ok::fn utf8_empty(self: Self, yes: bool) -> ConfigToggles whether empty matches are permitted to occur between the code units of a UTF-8 encoded codepoint.
This should generally be enabled when search a
&stror anything that you otherwise know is valid UTF-8. It should be disabled in all other cases. Namely, if the haystack is not valid UTF-8 and this is enabled, then behavior is unspecified.By default, this is enabled.
Example
use ; let re = new?; let got: = re.find_iter.collect; // Matches only occur at the beginning and end of the snowman. assert_eq!; let re = builder .configure .build?; let got: = re.find_iter.collect; // Matches now occur at every position! assert_eq!; Ok::fn auto_prefilter(self: Self, yes: bool) -> ConfigToggles whether automatic prefilter support is enabled.
If this is disabled and
Config::prefilteris not set, then the meta regex engine will not use any prefilters. This can sometimes be beneficial in cases where you know (or have measured) that the prefilter leads to overall worse search performance.By default, this is enabled.
Example
# if cfg! // miri takes too long use ; let re = builder .configure .build?; let hay = "Hello Bruce Springsteen!"; assert_eq!; Ok::fn prefilter(self: Self, pre: Option<Prefilter>) -> ConfigOverrides and sets the prefilter to use inside a
Regex.This permits one to forcefully set a prefilter in cases where the caller knows better than whatever the automatic prefilter logic is capable of.
By default, this is set to
Noneand an automatic prefilter will be used if one could be built. (AssumingConfig::auto_prefilteris enabled, which it is by default.)Example
This example shows how to set your own prefilter. In the case of a pattern like
Bruce \w+, the automatic prefilter is likely to be constructed in a way that it will look for occurrences ofBruce. In most cases, this is the best choice. But in some cases, it may be the case that runningmemchronBis the best choice. One can achieve that behavior by overriding the automatic prefilter logic and providing a prefilter that just matchesB.# if cfg! // miri takes too long use ; let pre = new .expect; let re = builder .configure .build?; let hay = "Hello Bruce Springsteen!"; assert_eq!; # Ok::Example: incorrect prefilters can lead to incorrect results!
Be warned that setting an incorrect prefilter can lead to missed matches. So if you use this option, ensure your prefilter can never report false negatives. (A false positive is, on the other hand, quite okay and generally unavoidable.)
# if cfg! // miri takes too long use ; let pre = new .expect; let re = builder .configure .build?; let hay = "Hello Bruce Springsteen!"; // Oops! No match found, but there should be one! assert_eq!; # Ok::fn which_captures(self: Self, which_captures: WhichCaptures) -> ConfigConfigures what kinds of groups are compiled as "capturing" in the underlying regex engine.
This is set to
WhichCaptures::Allby default. Callers may wish to useWhichCaptures::Implicitin cases where one wants avoid the overhead of capture states for explicit groups.Note that another approach to avoiding the overhead of capture groups is by using non-capturing groups in the regex pattern. That is,
(?:a)instead of(a). This option is useful when you can't control the concrete syntax but know that you don't need the underlying capture states. For example, usingWhichCaptures::Implicitwill behave as if all explicit capturing groups in the pattern were non-capturing.Setting this to
WhichCaptures::Noneis usually not the right thing to do. When no capture states are compiled, some regex engines (such as thePikeVM) won't be able to report match offsets. This will manifest as no match being found.Example
This example demonstrates how the results of capture groups can change based on this option. First we show the default (all capture groups in the pattern are capturing):
use ; let re = new?; let hay = "foo123bar"; let mut caps = re.create_captures; re.captures; assert_eq!; assert_eq!; Ok::And now we show the behavior when we only include implicit capture groups. In this case, we can only find the overall match span, but the spans of any other explicit group don't exist because they are treated as non-capturing. (In effect, when
WhichCaptures::Implicitis used, there is no real point in usingRegex::capturessince it will never be able to report more information thanRegex::find.)use ; let re = builder .configure .build?; let hay = "foo123bar"; let mut caps = re.create_captures; re.captures; assert_eq!; assert_eq!; Ok::fn nfa_size_limit(self: Self, limit: Option<usize>) -> ConfigSets the size limit, in bytes, to enforce on the construction of every NFA build by the meta regex engine.
Setting it to
Nonedisables the limit. This is not recommended if you're compiling untrusted patterns.Note that this limit is applied to each NFA built, and if any of them exceed the limit, then construction will fail. This limit does not correspond to the total memory used by all NFAs in the meta regex engine.
This defaults to some reasonable number that permits most reasonable patterns.
Example
# if cfg! // miri takes too long use Regex; let result = builder .configure // Not even 20KB is enough to build a single large Unicode class! .build; assert!; // But notice that building such a regex with the exact same limit // can succeed depending on other aspects of the configuration. For // example, a single *forward* NFA will (at time of writing) fit into // the 20KB limit, but a *reverse* NFA of the same pattern will not. // So if one configures a meta regex such that a reverse NFA is never // needed and thus never built, then the 20KB limit will be enough for // a pattern like \pL! let result = builder .configure // Not even 20KB is enough to build a single large Unicode class! .build; assert!; # Ok::fn onepass_size_limit(self: Self, limit: Option<usize>) -> ConfigSets the size limit, in bytes, for the one-pass DFA.
Setting it to
Nonedisables the limit. Disabling the limit is strongly discouraged when compiling untrusted patterns. Even if the patterns are trusted, it still may not be a good idea, since a one-pass DFA can use a lot of memory. With that said, as the size of a regex increases, the likelihood of it being one-pass likely decreases.This defaults to some reasonable number that permits most reasonable one-pass patterns.
Example
This shows how to set the one-pass DFA size limit. Note that since a one-pass DFA is an optional component of the meta regex engine, this size limit only impacts what is built internally and will never determine whether a
Regexitself fails to build.# if cfg! // miri takes too long use Regex; let result = builder .configure .build; assert!; # Ok::fn hybrid_cache_capacity(self: Self, limit: usize) -> ConfigSet the cache capacity, in bytes, for the lazy DFA.
The cache capacity of the lazy DFA determines approximately how much heap memory it is allowed to use to store its state transitions. The state transitions are computed at search time, and if the cache fills up it, it is cleared. At this point, any previously generated state transitions are lost and are re-generated if they're needed again.
This sort of cache filling and clearing works quite well so long as cache clearing happens infrequently. If it happens too often, then the meta regex engine will stop using the lazy DFA and switch over to a different regex engine.
In cases where the cache is cleared too often, it may be possible to give the cache more space and reduce (or eliminate) how often it is cleared. Similarly, sometimes a regex is so big that the lazy DFA isn't used at all if its cache capacity isn't big enough.
The capacity set here is a limit on how much memory is used. The actual memory used is only allocated as it's needed.
Determining the right value for this is a little tricky and will likely required some profiling. Enabling the
loggingfeature and setting the log level totracewill also tell you how often the cache is being cleared.Example
# if cfg! // miri takes too long use Regex; let result = builder .configure .build; assert!; # Ok::fn dfa_size_limit(self: Self, limit: Option<usize>) -> ConfigSets the size limit, in bytes, for heap memory used for a fully compiled DFA.
NOTE: If you increase this, you'll likely also need to increase
Config::dfa_state_limit.In contrast to the lazy DFA, building a full DFA requires computing all of its state transitions up front. This can be a very expensive process, and runs in worst case
2^ntime and space (wherenis proportional to the size of the regex). However, a full DFA unlocks some additional optimization opportunities.Because full DFAs can be so expensive, the default limits for them are incredibly small. Generally speaking, if your regex is moderately big or if you're using Unicode features (
\wis Unicode-aware by default for example), then you can expect that the meta regex engine won't even attempt to build a DFA for it.If this and
Config::dfa_state_limitare set toNone, then the meta regex will not use any sort of limits when deciding whether to build a DFA. This in turn makes construction of aRegextake worst case exponential time and space. Even short patterns can result in huge space blow ups. So it is strongly recommended to keep some kind of limit set!The default is set to a small number that permits some simple regexes to get compiled into DFAs in reasonable time.
Example
# if cfg! // miri takes too long use Regex; let result = builder // 100MB is much bigger than the default. .configure .build; assert!; # Ok::fn dfa_state_limit(self: Self, limit: Option<usize>) -> ConfigSets a limit on the total number of NFA states, beyond which, a full DFA is not attempted to be compiled.
This limit works in concert with
Config::dfa_size_limit. Namely, where asConfig::dfa_size_limitis applied by attempting to construct a DFA, this limit is used to avoid the attempt in the first place. This is useful to avoid hefty initialization costs associated with building a DFA for cases where it is obvious the DFA will ultimately be too big.By default, this is set to a very small number.
Example
# if cfg! // miri takes too long use Regex; let result = builder .configure .build; assert!; # Ok::fn byte_classes(self: Self, yes: bool) -> ConfigWhether to attempt to shrink the size of the alphabet for the regex pattern or not. When enabled, the alphabet is shrunk into a set of equivalence classes, where every byte in the same equivalence class cannot discriminate between a match or non-match.
WARNING: This is only useful for debugging DFAs. Disabling this does not yield any speed advantages. Indeed, disabling it can result in much higher memory usage. Disabling byte classes is useful for debugging the actual generated transitions because it lets one see the transitions defined on actual bytes instead of the equivalence classes.
This option is enabled by default and should never be disabled unless one is debugging the meta regex engine's internals.
Example
use ; let re = builder .configure .build?; let hay = "!!quux!!"; assert_eq!; # Ok::fn line_terminator(self: Self, byte: u8) -> ConfigSet the line terminator to be used by the
^and$anchors in multi-line mode.This option has no effect when CRLF mode is enabled. That is, regardless of this setting,
(?Rm:^)and(?Rm:$)will always treat\rand\nas line terminators (and will never match between a\rand a\n).By default,
\nis the line terminator.Warning: This does not change the behavior of
.. To do that, you'll need to configure the syntax optionsyntax::Config::line_terminatorin addition to this. Otherwise,.will continue to match any character other than\n.Example
use ; let re = builder .syntax .configure .build?; let hay = "\x00foo\x00"; assert_eq!; # Ok::fn hybrid(self: Self, yes: bool) -> ConfigToggle whether the hybrid NFA/DFA (also known as the "lazy DFA") should be available for use by the meta regex engine.
Enabling this does not necessarily mean that the lazy DFA will definitely be used. It just means that it will be available for use if the meta regex engine thinks it will be useful.
When the
hybridcrate feature is enabled, then this is enabled by default. Otherwise, if the crate feature is disabled, then this is always disabled, regardless of its setting by the caller.fn dfa(self: Self, yes: bool) -> ConfigToggle whether a fully compiled DFA should be available for use by the meta regex engine.
Enabling this does not necessarily mean that a DFA will definitely be used. It just means that it will be available for use if the meta regex engine thinks it will be useful.
When the
dfa-buildcrate feature is enabled, then this is enabled by default. Otherwise, if the crate feature is disabled, then this is always disabled, regardless of its setting by the caller.fn onepass(self: Self, yes: bool) -> ConfigToggle whether a one-pass DFA should be available for use by the meta regex engine.
Enabling this does not necessarily mean that a one-pass DFA will definitely be used. It just means that it will be available for use if the meta regex engine thinks it will be useful. (Indeed, a one-pass DFA can only be used when the regex is one-pass. See the
dfa::onepassmodule for more details.)When the
dfa-onepasscrate feature is enabled, then this is enabled by default. Otherwise, if the crate feature is disabled, then this is always disabled, regardless of its setting by the caller.fn backtrack(self: Self, yes: bool) -> ConfigToggle whether a bounded backtracking regex engine should be available for use by the meta regex engine.
Enabling this does not necessarily mean that a bounded backtracker will definitely be used. It just means that it will be available for use if the meta regex engine thinks it will be useful.
When the
nfa-backtrackcrate feature is enabled, then this is enabled by default. Otherwise, if the crate feature is disabled, then this is always disabled, regardless of its setting by the caller.fn get_match_kind(self: &Self) -> MatchKindReturns the match kind on this configuration, as set by
Config::match_kind.If it was not explicitly set, then a default value is returned.
fn get_utf8_empty(self: &Self) -> boolReturns whether empty matches must fall on valid UTF-8 boundaries, as set by
Config::utf8_empty.If it was not explicitly set, then a default value is returned.
fn get_auto_prefilter(self: &Self) -> boolReturns whether automatic prefilters are enabled, as set by
Config::auto_prefilter.If it was not explicitly set, then a default value is returned.
fn get_prefilter(self: &Self) -> Option<&Prefilter>Returns a manually set prefilter, if one was set by
Config::prefilter.If it was not explicitly set, then a default value is returned.
fn get_which_captures(self: &Self) -> WhichCapturesReturns the capture configuration, as set by
Config::which_captures.If it was not explicitly set, then a default value is returned.
fn get_nfa_size_limit(self: &Self) -> Option<usize>Returns NFA size limit, as set by
Config::nfa_size_limit.If it was not explicitly set, then a default value is returned.
fn get_onepass_size_limit(self: &Self) -> Option<usize>Returns one-pass DFA size limit, as set by
Config::onepass_size_limit.If it was not explicitly set, then a default value is returned.
fn get_hybrid_cache_capacity(self: &Self) -> usizeReturns hybrid NFA/DFA cache capacity, as set by
Config::hybrid_cache_capacity.If it was not explicitly set, then a default value is returned.
fn get_dfa_size_limit(self: &Self) -> Option<usize>Returns DFA size limit, as set by
Config::dfa_size_limit.If it was not explicitly set, then a default value is returned.
fn get_dfa_state_limit(self: &Self) -> Option<usize>Returns DFA size limit in terms of the number of states in the NFA, as set by
Config::dfa_state_limit.If it was not explicitly set, then a default value is returned.
fn get_byte_classes(self: &Self) -> boolReturns whether byte classes are enabled, as set by
Config::byte_classes.If it was not explicitly set, then a default value is returned.
fn get_line_terminator(self: &Self) -> u8Returns the line terminator for this configuration, as set by
Config::line_terminator.If it was not explicitly set, then a default value is returned.
fn get_hybrid(self: &Self) -> boolReturns whether the hybrid NFA/DFA regex engine may be used, as set by
Config::hybrid.If it was not explicitly set, then a default value is returned.
fn get_dfa(self: &Self) -> boolReturns whether the DFA regex engine may be used, as set by
Config::dfa.If it was not explicitly set, then a default value is returned.
fn get_onepass(self: &Self) -> boolReturns whether the one-pass DFA regex engine may be used, as set by
Config::onepass.If it was not explicitly set, then a default value is returned.
fn get_backtrack(self: &Self) -> boolReturns whether the bounded backtracking regex engine may be used, as set by
Config::backtrack.If it was not explicitly set, then a default value is returned.
impl Clone for Config
fn clone(self: &Self) -> Config
impl Debug for Config
fn fmt(self: &Self, f: &mut Formatter<'_>) -> Result
impl Default for Config
fn default() -> Config
impl Freeze for Config
impl RefUnwindSafe for Config
impl Send for Config
impl Sync for Config
impl Unpin for Config
impl UnsafeUnpin for Config
impl UnwindSafe for Config
impl<T> Any for Config
fn type_id(self: &Self) -> TypeId
impl<T> Borrow for Config
fn borrow(self: &Self) -> &T
impl<T> BorrowMut for Config
fn borrow_mut(self: &mut Self) -> &mut T
impl<T> CloneToUninit for Config
unsafe fn clone_to_uninit(self: &Self, dest: *mut u8)
impl<T> From for Config
fn from(t: T) -> TReturns the argument unchanged.
impl<T> ToOwned for Config
fn to_owned(self: &Self) -> Tfn clone_into(self: &Self, target: &mut T)
impl<T, U> Into for Config
fn into(self: Self) -> UCalls
U::from(self).That is, this conversion is whatever the implementation of
[From]<T> for Uchooses to do.
impl<T, U> TryFrom for Config
fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>
impl<T, U> TryInto for Config
fn try_into(self: Self) -> Result<U, <U as TryFrom<T>>::Error>