Struct `DFA`

struct DFA { ... }

A one-pass DFA for executing a subset of anchored regex searches while resolving capturing groups.

A one-pass DFA can be built from an NFA that is one-pass. An NFA is one-pass when there is never any ambiguity about how to continue a search. For example, a*a is not one-pass becuase during a search, it's not possible to know whether to continue matching the a* or to move on to the single a. However, a*b is one-pass, because for every byte in the input, it's always clear when to move on from a* to b.

Only anchored searches are supported

In this crate, especially for DFAs, unanchored searches are implemented by treating the pattern as if it had a (?s-u:.)*? prefix. While the prefix is one-pass on its own, adding anything after it, e.g., (?s-u:.)*?a will make the overall pattern not one-pass. Why? Because the (?s-u:.) matches any byte, and there is therefore ambiguity as to when the prefix should stop matching and something else should start matching.

Therefore, one-pass DFAs do not support unanchored searches. In addition to many regexes simply not being one-pass, it implies that one-pass DFAs have limited utility. With that said, when a one-pass DFA can be used, it can potentially provide a dramatic speed up over alternatives like the BoundedBacktracker and the PikeVM. In particular, a one-pass DFA is the only DFA capable of reporting the spans of matching capturing groups.

To clarify, when we say that unanchored searches are not supported, what that actually means is:

The high level routines, DFA::is_match and DFA::captures, always do anchored searches.
Since iterators are most useful in the context of unanchored searches, there is no DFA::captures_iter method.
For lower level routines like DFA::try_search, an error will be returned if the given Input is configured to do an unanchored search or search for an invalid pattern ID. (Note that an Input is configured to do an unanchored search by default, so just giving a Input::new is guaranteed to return an error.)

Other limitations

In addition to the configurable heap limit and the requirement that a regex pattern be one-pass, there are some other limitations:

There is an internal limit on the total number of explicit capturing groups that appear across all patterns. It is somewhat small and there is no way to configure it. If your pattern(s) exceed this limit, then building a one-pass DFA will fail.
If the number of patterns exceeds an internal unconfigurable limit, then building a one-pass DFA will fail. This limit is quite large and you're unlikely to hit it.
If the total number of states exceeds an internal unconfigurable limit, then building a one-pass DFA will fail. This limit is quite large and you're unlikely to hit it.

Other examples of regexes that aren't one-pass

One particularly unfortunate example is that enabling Unicode can cause regexes that were one-pass to no longer be one-pass. Consider the regex (?-u)\w*\s for example. It is one-pass because there is exactly no overlap between the ASCII definitions of \w and \s. But \w*\s (i.e., with Unicode enabled) is not one-pass because \w and \s get translated to UTF-8 automatons. And while the codepoints in \w and \s do not overlap, the underlying UTF-8 encodings do. Indeed, because of the overlap between UTF-8 automata, the use of Unicode character classes will tend to vastly increase the likelihood of a regex not being one-pass.

How does one know if a regex is one-pass or not?

At the time of writing, the only way to know is to try and build a one-pass DFA. The one-pass property is checked while constructing the DFA.

This does mean that you might potentially waste some CPU cycles and memory by optimistically trying to build a one-pass DFA. But this is currently the only way. In the future, building a one-pass DFA might be able to use some heuristics to detect common violations of the one-pass property and bail more quickly.

Resource usage

Unlike a general DFA, a one-pass DFA has stricter bounds on its resource usage. Namely, construction of a one-pass DFA has a time and space complexity of O(n), where n ~ nfa.states().len(). (A general DFA's time and space complexity is O(2^n).) This smaller time bound is achieved because there is at most one DFA state created for each NFA state. If additional DFA states would be required, then the pattern is not one-pass and construction will fail.

Note though that currently, this DFA uses a fully dense representation. This means that while its space complexity is no worse than an NFA, it may in practice use more memory because of higher constant factors. The reason for this trade off is two-fold. Firstly, a dense representation makes the search faster. Secondly, the bigger an NFA, the more unlikely it is to be one-pass. Therefore, most one-pass DFAs are usually pretty small.

Example

This example shows that the one-pass DFA implements Unicode word boundaries correctly while simultaneously reporting spans for capturing groups that participate in a match. (This is the only DFA that implements full support for Unicode word boundaries.)

# if cfg!(miri) { return Ok(()); } // miri takes too long
use regex_automata::{dfa::onepass::DFA, Match, Span};

let re = DFA::new(r"\b(?P<first>\w+)[[:space:]]+(?P<last>\w+)\b")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());

re.captures(&mut cache, "Шерлок Холмс", &mut caps);
assert_eq!(Some(Match::must(0, 0..23)), caps.get_match());
assert_eq!(Some(Span::from(0..12)), caps.get_group_by_name("first"));
assert_eq!(Some(Span::from(13..23)), caps.get_group_by_name("last"));
# Ok::<(), Box<dyn std::error::Error>>(())

Example: iteration

Unlike other regex engines in this crate, this one does not provide iterator search functions. This is because a one-pass DFA only supports anchored searches, and so iterator functions are generally not applicable.

However, if you know that all of your matches are directly adjacent, then an iterator can be used. The util::iter::Searcher type can be used for this purpose:

# if cfg!(miri) { return Ok(()); } // miri takes too long
use regex_automata::{
    dfa::onepass::DFA,
    util::iter::Searcher,
    Anchored, Input, Span,
};

let re = DFA::new(r"\w(\d)\w")?;
let (mut cache, caps) = (re.create_cache(), re.create_captures());
let input = Input::new("a1zb2yc3x").anchored(Anchored::Yes);

let mut it = Searcher::new(input).into_captures_iter(caps, |input, caps| {
    Ok(re.try_search(&mut cache, input, caps)?)
}).infallible();
let caps0 = it.next().unwrap();
assert_eq!(Some(Span::from(1..2)), caps0.get_group(1));

# Ok::<(), Box<dyn std::error::Error>>(())

Implementations

`impl DFA`

fn new(pattern: &str) -> Result<DFA, BuildError>

Parse the given regular expression using the default configuration and return the corresponding one-pass DFA.

If you want a non-default configuration, then use the Builder to set your own configuration.

Example

use regex_automata::{dfa::onepass::DFA, Match};

let re = DFA::new("foo[0-9]+bar")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());

re.captures(&mut cache, "foo12345barzzz", &mut caps);
assert_eq!(Some(Match::must(0, 0..11)), caps.get_match());
# Ok::<(), Box<dyn std::error::Error>>(())

fn new_many<P: AsRef<str>>(patterns: &[P]) -> Result<DFA, BuildError>

Like new, but parses multiple patterns into a single "multi regex." This similarly uses the default regex configuration.

Example

use regex_automata::{dfa::onepass::DFA, Match};

let re = DFA::new_many(&["[a-z]+", "[0-9]+"])?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());

re.captures(&mut cache, "abc123", &mut caps);
assert_eq!(Some(Match::must(0, 0..3)), caps.get_match());

re.captures(&mut cache, "123abc", &mut caps);
assert_eq!(Some(Match::must(1, 0..3)), caps.get_match());

# Ok::<(), Box<dyn std::error::Error>>(())

fn new_from_nfa(nfa: NFA) -> Result<DFA, BuildError>

Like new, but builds a one-pass DFA directly from an NFA. This is useful if you already have an NFA, or even if you hand-assembled the NFA.

Example

This shows how to hand assemble a regular expression via its HIR, compile an NFA from it and build a one-pass DFA from the NFA.

use regex_automata::{
    dfa::onepass::DFA,
    nfa::thompson::NFA,
    Match,
};
use regex_syntax::hir::{Hir, Class, ClassBytes, ClassBytesRange};

let hir = Hir::class(Class::Bytes(ClassBytes::new(vec![
    ClassBytesRange::new(b'0', b'9'),
    ClassBytesRange::new(b'A', b'Z'),
    ClassBytesRange::new(b'_', b'_'),
    ClassBytesRange::new(b'a', b'z'),
])));

let config = NFA::config().nfa_size_limit(Some(1_000));
let nfa = NFA::compiler().configure(config).build_from_hir(&hir)?;

let re = DFA::new_from_nfa(nfa)?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
let expected = Some(Match::must(0, 0..1));
re.captures(&mut cache, "A", &mut caps);
assert_eq!(expected, caps.get_match());

# Ok::<(), Box<dyn std::error::Error>>(())

fn always_match() -> Result<DFA, BuildError>

Create a new one-pass DFA that matches every input.

Example

use regex_automata::{dfa::onepass::DFA, Match};

let dfa = DFA::always_match()?;
let mut cache = dfa.create_cache();
let mut caps = dfa.create_captures();

let expected = Match::must(0, 0..0);
dfa.captures(&mut cache, "", &mut caps);
assert_eq!(Some(expected), caps.get_match());
dfa.captures(&mut cache, "foo", &mut caps);
assert_eq!(Some(expected), caps.get_match());
# Ok::<(), Box<dyn std::error::Error>>(())

fn never_match() -> Result<DFA, BuildError>

Create a new one-pass DFA that never matches any input.

Example

use regex_automata::dfa::onepass::DFA;

let dfa = DFA::never_match()?;
let mut cache = dfa.create_cache();
let mut caps = dfa.create_captures();

dfa.captures(&mut cache, "", &mut caps);
assert_eq!(None, caps.get_match());
dfa.captures(&mut cache, "foo", &mut caps);
assert_eq!(None, caps.get_match());
# Ok::<(), Box<dyn std::error::Error>>(())

fn config() -> Config

Return a default configuration for a DFA.

This is a convenience routine to avoid needing to import the Config type when customizing the construction of a DFA.

Example

This example shows how to change the match semantics of this DFA from its default "leftmost first" to "all." When using "all," non-greediness doesn't apply and neither does preference order matching. Instead, the longest match possible is always returned. (Although, by construction, it's impossible for a one-pass DFA to have a different answer for "preference order" vs "longest match.")

use regex_automata::{dfa::onepass::DFA, Match, MatchKind};

let re = DFA::builder()
    .configure(DFA::config().match_kind(MatchKind::All))
    .build(r"(abc)+?")?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();

re.captures(&mut cache, "abcabc", &mut caps);
// Normally, the non-greedy repetition would give us a 0..3 match.
assert_eq!(Some(Match::must(0, 0..6)), caps.get_match());
# Ok::<(), Box<dyn std::error::Error>>(())

fn builder() -> Builder

Return a builder for configuring the construction of a DFA.

This is a convenience routine to avoid needing to import the Builder type in common cases.

Example

This example shows how to use the builder to disable UTF-8 mode.

# if cfg!(miri) { return Ok(()); } // miri takes too long
use regex_automata::{
    dfa::onepass::DFA,
    nfa::thompson,
    util::syntax,
    Match,
};

let re = DFA::builder()
    .syntax(syntax::Config::new().utf8(false))
    .thompson(thompson::Config::new().utf8(false))
    .build(r"foo(?-u:[^b])ar.*")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());

let haystack = b"foo\xFFarzz\xE2\x98\xFF\n";
let expected = Some(Match::must(0, 0..8));
re.captures(&mut cache, haystack, &mut caps);
assert_eq!(expected, caps.get_match());

# Ok::<(), Box<dyn std::error::Error>>(())

fn create_captures(self: &Self) -> Captures

Create a new empty set of capturing groups that is guaranteed to be valid for the search APIs on this DFA.

A Captures value created for a specific DFA cannot be used with any other DFA.

This is a convenience function for Captures::all. See the Captures documentation for an explanation of its alternative constructors that permit the DFA to do less work during a search, and thus might make it faster.

fn create_cache(self: &Self) -> Cache

Create a new cache for this DFA.

The cache returned should only be used for searches for this DFA. If you want to reuse the cache for another DFA, then you must call Cache::reset with that DFA (or, equivalently, DFA::reset_cache).

fn reset_cache(self: &Self, cache: &mut Cache)

Reset the given cache such that it can be used for searching with the this DFA (and only this DFA).

A cache reset permits reusing memory already allocated in this cache with a different DFA.

Example

This shows how to re-purpose a cache for use with a different DFA.

# if cfg!(miri) { return Ok(()); } // miri takes too long
use regex_automata::{dfa::onepass::DFA, Match};

let re1 = DFA::new(r"\w")?;
let re2 = DFA::new(r"\W")?;
let mut caps1 = re1.create_captures();
let mut caps2 = re2.create_captures();

let mut cache = re1.create_cache();
assert_eq!(
    Some(Match::must(0, 0..2)),
    { re1.captures(&mut cache, "Δ", &mut caps1); caps1.get_match() },
);

// Using 'cache' with re2 is not allowed. It may result in panics or
// incorrect results. In order to re-purpose the cache, we must reset
// it with the one-pass DFA we'd like to use it with.
//
// Similarly, after this reset, using the cache with 're1' is also not
// allowed.
re2.reset_cache(&mut cache);
assert_eq!(
    Some(Match::must(0, 0..3)),
    { re2.captures(&mut cache, "☃", &mut caps2); caps2.get_match() },
);

# Ok::<(), Box<dyn std::error::Error>>(())

fn get_config(self: &Self) -> &Config

Return the config for this one-pass DFA.

fn get_nfa(self: &Self) -> &NFA

Returns a reference to the underlying NFA.

fn pattern_len(self: &Self) -> usize

Returns the total number of patterns compiled into this DFA.

In the case of a DFA that contains no patterns, this returns 0.

fn state_len(self: &Self) -> usize

Returns the total number of states in this one-pass DFA.

Note that unlike dense or sparse DFAs, a one-pass DFA does not expose a low level DFA API. Therefore, this routine has little use other than being informational.

fn alphabet_len(self: &Self) -> usize

Returns the total number of elements in the alphabet for this DFA.

That is, this returns the total number of transitions that each state in this DFA must have. The maximum alphabet size is 256, which corresponds to each possible byte value.

The alphabet size may be less than 256 though, and unless Config::byte_classes is disabled, it is typically must less than 256. Namely, bytes are grouped into equivalence classes such that no two bytes in the same class can distinguish a match from a non-match. For example, in the regex ^[a-z]+$, the ASCII bytes a-z could all be in the same equivalence class. This leads to a massive space savings.

Note though that the alphabet length does not necessarily equal the total stride space taken up by a single DFA state in the transition table. Namely, for performance reasons, the stride is always the smallest power of two that is greater than or equal to the alphabet length. For this reason, DFA::stride or DFA::stride2 are often more useful. The alphabet length is typically useful only for informational purposes.

Note also that unlike dense or sparse DFAs, a one-pass DFA does not have a special end-of-input (EOI) transition. This is because a one-pass DFA handles look-around assertions explicitly (like the PikeVM) and does not build them into the transitions of the DFA.

fn stride2(self: &Self) -> usize

Returns the total stride for every state in this DFA, expressed as the exponent of a power of 2. The stride is the amount of space each state takes up in the transition table, expressed as a number of transitions. (Unused transitions map to dead states.)

The stride of a DFA is always equivalent to the smallest power of 2 that is greater than or equal to the DFA's alphabet length. This definition uses extra space, but possibly permits faster translation between state identifiers and their corresponding offsets in this DFA's transition table.

For example, if the DFA's stride is 16 transitions, then its stride2 is 4 since 2^4 = 16.

The minimum stride2 value is 1 (corresponding to a stride of 2) while the maximum stride2 value is 9 (corresponding to a stride of 512). The maximum in theory should be 8, but because of some implementation quirks that may be relaxed in the future, it is one more than 8. (Do note that a maximal stride is incredibly rare, as it would imply that there is almost no redundant in the regex pattern.)

Note that unlike dense or sparse DFAs, a one-pass DFA does not expose a low level DFA API. Therefore, this routine has little use other than being informational.

fn stride(self: &Self) -> usize

Returns the total stride for every state in this DFA. This corresponds to the total number of transitions used by each state in this DFA's transition table.

Please see DFA::stride2 for more information. In particular, this returns the stride as the number of transitions, where as stride2 returns it as the exponent of a power of 2.

Note that unlike dense or sparse DFAs, a one-pass DFA does not expose a low level DFA API. Therefore, this routine has little use other than being informational.

fn memory_usage(self: &Self) -> usize

Returns the memory usage, in bytes, of this DFA.

The memory usage is computed based on the number of bytes used to represent this DFA.

This does not include the stack size used up by this DFA. To compute that, use std::mem::size_of::<onepass::DFA>().

`impl DFA`

fn is_match<'h, I: Into<Input<'h>>>(self: &Self, cache: &mut Cache, input: I) -> bool

Executes an anchored leftmost forward search, and returns true if and only if this one-pass DFA matches the given haystack.

This routine may short circuit if it knows that scanning future input will never lead to a different result. In particular, if the underlying DFA enters a match state, then this routine will return true immediately without inspecting any future input. (Consider how this might make a difference given the regex a+ on the haystack aaaaaaaaaaaaaaa. This routine can stop after it sees the first a, but routines like find need to continue searching because + is greedy by default.)

The given Input is forcefully set to use Anchored::Yes if the given configuration was Anchored::No (which is the default).

Panics

This routine panics if the search could not complete. This can occur in the following circumstances:

When the provided Input configuration is not supported. For example, by providing an unsupported anchor mode. Concretely, this occurs when using Anchored::Pattern without enabling Config::starts_for_each_pattern.

When a search panics, callers cannot know whether a match exists or not.

Use DFA::try_search if you want to handle these panics as error values instead.

Example

This shows basic usage:

use regex_automata::dfa::onepass::DFA;

let re = DFA::new("foo[0-9]+bar")?;
let mut cache = re.create_cache();

assert!(re.is_match(&mut cache, "foo12345bar"));
assert!(!re.is_match(&mut cache, "foobar"));
# Ok::<(), Box<dyn std::error::Error>>(())

Example: consistency with search APIs

is_match is guaranteed to return true whenever captures returns a match. This includes searches that are executed entirely within a codepoint:

use regex_automata::{dfa::onepass::DFA, Input};

let re = DFA::new("a*")?;
let mut cache = re.create_cache();

assert!(!re.is_match(&mut cache, Input::new("☃").span(1..2)));
# Ok::<(), Box<dyn std::error::Error>>(())

Notice that when UTF-8 mode is disabled, then the above reports a match because the restriction against zero-width matches that split a codepoint has been lifted:

use regex_automata::{dfa::onepass::DFA, nfa::thompson::NFA, Input};

let re = DFA::builder()
    .thompson(NFA::config().utf8(false))
    .build("a*")?;
let mut cache = re.create_cache();

assert!(re.is_match(&mut cache, Input::new("☃").span(1..2)));
# Ok::<(), Box<dyn std::error::Error>>(())

fn find<'h, I: Into<Input<'h>>>(self: &Self, cache: &mut Cache, input: I) -> Option<Match>

Executes an anchored leftmost forward search, and returns a Match if and only if this one-pass DFA matches the given haystack.

This routine only includes the overall match span. To get access to the individual spans of each capturing group, use DFA::captures.

The given Input is forcefully set to use Anchored::Yes if the given configuration was Anchored::No (which is the default).

Panics

This routine panics if the search could not complete. This can occur in the following circumstances:

When the provided Input configuration is not supported. For example, by providing an unsupported anchor mode. Concretely, this occurs when using Anchored::Pattern without enabling Config::starts_for_each_pattern.

When a search panics, callers cannot know whether a match exists or not.

Use DFA::try_search if you want to handle these panics as error values instead.

Example

Leftmost first match semantics corresponds to the match with the smallest starting offset, but where the end offset is determined by preferring earlier branches in the original regular expression. For example, Sam|Samwise will match Sam in Samwise, but Samwise|Sam will match Samwise in Samwise.

Generally speaking, the "leftmost first" match is how most backtracking regular expressions tend to work. This is in contrast to POSIX-style regular expressions that yield "leftmost longest" matches. Namely, both Sam|Samwise and Samwise|Sam match Samwise when using leftmost longest semantics. (This crate does not currently support leftmost longest semantics.)

use regex_automata::{dfa::onepass::DFA, Match};

let re = DFA::new("foo[0-9]+")?;
let mut cache = re.create_cache();
let expected = Match::must(0, 0..8);
assert_eq!(Some(expected), re.find(&mut cache, "foo12345"));

// Even though a match is found after reading the first byte (`a`),
// the leftmost first match semantics demand that we find the earliest
// match that prefers earlier parts of the pattern over later parts.
let re = DFA::new("abc|a")?;
let mut cache = re.create_cache();
let expected = Match::must(0, 0..3);
assert_eq!(Some(expected), re.find(&mut cache, "abc"));

# Ok::<(), Box<dyn std::error::Error>>(())

fn captures<'h, I: Into<Input<'h>>>(self: &Self, cache: &mut Cache, input: I, caps: &mut Captures)

Executes an anchored leftmost forward search and writes the spans of capturing groups that participated in a match into the provided Captures value. If no match was found, then Captures::is_match is guaranteed to return false.

The given Input is forcefully set to use Anchored::Yes if the given configuration was Anchored::No (which is the default).

Panics

This routine panics if the search could not complete. This can occur in the following circumstances:

When the provided Input configuration is not supported. For example, by providing an unsupported anchor mode. Concretely, this occurs when using Anchored::Pattern without enabling Config::starts_for_each_pattern.

When a search panics, callers cannot know whether a match exists or not.

Use DFA::try_search if you want to handle these panics as error values instead.

Example

This shows a simple example of a one-pass regex that extracts capturing group spans.

use regex_automata::{dfa::onepass::DFA, Match, Span};

let re = DFA::new(
    // Notice that we use ASCII here. The corresponding Unicode regex
    // is sadly not one-pass.
    "(?P<first>[[:alpha:]]+)[[:space:]]+(?P<last>[[:alpha:]]+)",
)?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());

re.captures(&mut cache, "Bruce Springsteen", &mut caps);
assert_eq!(Some(Match::must(0, 0..17)), caps.get_match());
assert_eq!(Some(Span::from(0..5)), caps.get_group(1));
assert_eq!(Some(Span::from(6..17)), caps.get_group_by_name("last"));

# Ok::<(), Box<dyn std::error::Error>>(())

fn try_search(self: &Self, cache: &mut Cache, input: &Input<'_>, caps: &mut Captures) -> Result<(), MatchError>

The differences with DFA::captures are:

This returns an error instead of panicking if the search fails.
Accepts an &Input instead of a Into<Input>. This permits reusing the same input for multiple searches, which may be important for latency.
This does not automatically change the Anchored mode from No to Yes. Instead, if Input::anchored is Anchored::No, then an error is returned.

Errors

This routine errors if the search could not complete. This can occur in the following circumstances:

When the provided Input configuration is not supported. For example, by providing an unsupported anchor mode. Concretely, this occurs when using Anchored::Pattern without enabling Config::starts_for_each_pattern.

When a search returns an error, callers cannot know whether a match exists or not.

Example: specific pattern search

This example shows how to build a multi-regex that permits searching for specific patterns. Note that this is somewhat less useful than in other regex engines, since a one-pass DFA by definition has no ambiguity about which pattern can match at a position. That is, if it were possible for two different patterns to match at the same starting position, then the multi-regex would not be one-pass and construction would have failed.

Nevertheless, this can still be useful if you only care about matches for a specific pattern, and want the DFA to report "no match" even if some other pattern would have matched.

Note that in order to make use of this functionality, Config::starts_for_each_pattern must be enabled. It is disabled by default since it may result in higher memory usage.

use regex_automata::{
    dfa::onepass::DFA, Anchored, Input, Match, PatternID,
};

let re = DFA::builder()
    .configure(DFA::config().starts_for_each_pattern(true))
    .build_many(&["[a-z]+", "[0-9]+"])?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
let haystack = "123abc";
let input = Input::new(haystack).anchored(Anchored::Yes);

// A normal multi-pattern search will show pattern 1 matches.
re.try_search(&mut cache, &input, &mut caps)?;
assert_eq!(Some(Match::must(1, 0..3)), caps.get_match());

// If we only want to report pattern 0 matches, then we'll get no
// match here.
let input = input.anchored(Anchored::Pattern(PatternID::must(0)));
re.try_search(&mut cache, &input, &mut caps)?;
assert_eq!(None, caps.get_match());

# Ok::<(), Box<dyn std::error::Error>>(())

Example: specifying the bounds of a search

This example shows how providing the bounds of a search can produce different results than simply sub-slicing the haystack.

# if cfg!(miri) { return Ok(()); } // miri takes too long
use regex_automata::{dfa::onepass::DFA, Anchored, Input, Match};

// one-pass DFAs fully support Unicode word boundaries!
// A sad joke is that a Unicode aware regex like \w+\s is not one-pass.
// :-(
let re = DFA::new(r"\b[0-9]{3}\b")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
let haystack = "foo123bar";

// Since we sub-slice the haystack, the search doesn't know about
// the larger context and assumes that `123` is surrounded by word
// boundaries. And of course, the match position is reported relative
// to the sub-slice as well, which means we get `0..3` instead of
// `3..6`.
let expected = Some(Match::must(0, 0..3));
let input = Input::new(&haystack[3..6]).anchored(Anchored::Yes);
re.try_search(&mut cache, &input, &mut caps)?;
assert_eq!(expected, caps.get_match());

// But if we provide the bounds of the search within the context of the
// entire haystack, then the search can take the surrounding context
// into account. (And if we did find a match, it would be reported
// as a valid offset into `haystack` instead of its sub-slice.)
let expected = None;
let input = Input::new(haystack).range(3..6).anchored(Anchored::Yes);
re.try_search(&mut cache, &input, &mut caps)?;
assert_eq!(expected, caps.get_match());

# Ok::<(), Box<dyn std::error::Error>>(())

fn try_search_slots(self: &Self, cache: &mut Cache, input: &Input<'_>, slots: &mut [Option<NonMaxUsize>]) -> Result<Option<PatternID>, MatchError>

Executes an anchored leftmost forward search and writes the spans of capturing groups that participated in a match into the provided slots, and returns the matching pattern ID. The contents of the slots for patterns other than the matching pattern are unspecified. If no match was found, then None is returned and the contents of all slots is unspecified.

This is like DFA::try_search, but it accepts a raw slots slice instead of a Captures value. This is useful in contexts where you don't want or need to allocate a Captures.

It is legal to pass any number of slots to this routine. If the regex engine would otherwise write a slot offset that doesn't fit in the provided slice, then it is simply skipped. In general though, there are usually three slice lengths you might want to use:

An empty slice, if you only care about which pattern matched.
A slice with pattern_len() * 2 slots, if you only care about the overall match spans for each matching pattern.
A slice with slot_len() slots, which permits recording match offsets for every capturing group in every pattern.

Errors

This routine errors if the search could not complete. This can occur in the following circumstances:

When the provided Input configuration is not supported. For example, by providing an unsupported anchor mode. Concretely, this occurs when using Anchored::Pattern without enabling Config::starts_for_each_pattern.

When a search returns an error, callers cannot know whether a match exists or not.

Example

This example shows how to find the overall match offsets in a multi-pattern search without allocating a Captures value. Indeed, we can put our slots right on the stack.

use regex_automata::{dfa::onepass::DFA, Anchored, Input, PatternID};

let re = DFA::new_many(&[
    r"[a-zA-Z]+",
    r"[0-9]+",
])?;
let mut cache = re.create_cache();
let input = Input::new("123").anchored(Anchored::Yes);

// We only care about the overall match offsets here, so we just
// allocate two slots for each pattern. Each slot records the start
// and end of the match.
let mut slots = [None; 4];
let pid = re.try_search_slots(&mut cache, &input, &mut slots)?;
assert_eq!(Some(PatternID::must(1)), pid);

// The overall match offsets are always at 'pid * 2' and 'pid * 2 + 1'.
// See 'GroupInfo' for more details on the mapping between groups and
// slot indices.
let slot_start = pid.unwrap().as_usize() * 2;
let slot_end = slot_start + 1;
assert_eq!(Some(0), slots[slot_start].map(|s| s.get()));
assert_eq!(Some(3), slots[slot_end].map(|s| s.get()));

# Ok::<(), Box<dyn std::error::Error>>(())

`impl Clone for DFA`

fn clone(self: &Self) -> DFA

`impl Debug for DFA`

fn fmt(self: &Self, f: &mut Formatter<'_>) -> Result

`impl Freeze for DFA`

`impl RefUnwindSafe for DFA`

`impl Send for DFA`

`impl Sync for DFA`

`impl Unpin for DFA`

`impl UnsafeUnpin for DFA`

`impl UnwindSafe for DFA`

`impl<T> Any for DFA`

fn type_id(self: &Self) -> TypeId

`impl<T> Borrow for DFA`

fn borrow(self: &Self) -> &T

`impl<T> BorrowMut for DFA`

fn borrow_mut(self: &mut Self) -> &mut T

`impl<T> CloneToUninit for DFA`

unsafe fn clone_to_uninit(self: &Self, dest: *mut u8)

`impl<T> From for DFA`

fn from(t: T) -> T: Returns the argument unchanged.

`impl<T> ToOwned for DFA`

fn to_owned(self: &Self) -> T
fn clone_into(self: &Self, target: &mut T)

`impl<T, U> Into for DFA`

fn into(self: Self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of [From]<T> for U chooses to do.

`impl<T, U> TryFrom for DFA`

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

`impl<T, U> TryInto for DFA`

fn try_into(self: Self) -> Result<U, <U as TryFrom<T>>::Error>