Trait Automaton
unsafe trait Automaton: private::Sealed
A trait that abstracts over Aho-Corasick automata.
This trait primarily exists for niche use cases such as:
- Using an NFA or DFA directly, bypassing the top-level
AhoCorasicksearcher. Currently, these includenoncontiguous::NFA,contiguous::NFAanddfa::DFA. - Implementing your own custom search routine by walking the automaton yourself. This might be useful for implementing search on non-contiguous strings or streams.
For most use cases, it is not expected that users will need
to use or even know about this trait. Indeed, the top level
AhoCorasick searcher does not expose any details
about this trait, nor does it implement it itself.
Note that this trait defines a number of default methods, such as
Automaton::try_find and Automaton::try_find_iter, which implement
higher level search routines in terms of the lower level automata API.
Sealed
Currently, this trait is sealed. That means users of this crate can write generic routines over this trait but cannot implement it themselves. This restriction may be lifted in the future, but sealing the trait permits adding new required methods in a backwards compatible fashion.
Special states
This trait encodes a notion of "special" states in an automaton. Namely, a state is treated as special if it is a dead, match or start state:
- A dead state is a state that cannot be left once entered. All transitions on a dead state lead back to itself. The dead state is meant to be treated as a sentinel indicating that the search should stop and return a match if one has been found, and nothing otherwise.
- A match state is a state that indicates one or more patterns have
matched. Depending on the
MatchKindof the automaton, a search may stop once a match is seen, or it may continue looking for matches until it enters a dead state or sees the end of the haystack. - A start state is a state that a search begins in. It is useful to know
when a search enters a start state because it may mean that a prefilter can
be used to skip ahead and quickly look for candidate matches. Unlike dead
and match states, it is never necessary to explicitly handle start states
for correctness. Indeed, in this crate, implementations of
Automatonwill only treat start states as "special" when a prefilter is enabled and active. Otherwise, treating it as special has no purpose and winds up slowing down the overall search because it results in ping-ponging between the main state transition and the "special" state logic.
Since checking whether a state is special by doing three different
checks would be too expensive inside a fast search loop, the
Automaton::is_special method is provided for quickly checking whether
the state is special. The Automaton::is_dead, Automaton::is_match and
Automaton::is_start predicates can then be used to determine which kind
of special state it is.
Panics
Most of the APIs on this trait should panic or give incorrect results
if invalid inputs are given to it. For example, Automaton::next_state
has unspecified behavior if the state ID given to it is not a valid
state ID for the underlying automaton. Valid state IDs can only be
retrieved in one of two ways: calling Automaton::start_state or calling
Automaton::next_state with a valid state ID.
Safety
This trait is not safe to implement so that code may rely on the correctness of implementations of this trait to avoid undefined behavior. The primary correctness guarantees are:
Automaton::start_statealways returns a valid state ID or an error or panics.Automaton::next_state, when given a valid state ID, always returns a valid state ID for all values ofanchoredandbyte, or otherwise panics.
In general, the rest of the methods on Automaton need to uphold their
contracts as well. For example, Automaton::is_dead should only returns
true if the given state ID is actually a dead state.
Note that currently this crate does not rely on the safety property defined here to avoid undefined behavior. Instead, this was done to make it possible to do in the future.
Example
This example shows how one might implement a basic but correct search
routine. We keep things simple by not using prefilters or worrying about
anchored searches, but do make sure our search is correct for all possible
MatchKind semantics. (The comments in the code below note the parts
that are needed to support certain MatchKind semantics.)
use ;
// Run an unanchored search for 'aut' in 'haystack'. Return the first match
// seen according to the automaton's match semantics. This returns an error
// if the given automaton does not support unanchored searches.
// Show that it works for standard searches.
let nfa = NFAnew.unwrap;
assert_eq!;
// But also works when using leftmost-first. Notice how the match result
// has changed!
let nfa = NFAbuilder
.match_kind
.build
.unwrap;
assert_eq!;
# Ok::
Required Methods
fn start_state(self: &Self, anchored: Anchored) -> Result<StateID, MatchError>Returns the starting state for the given anchor mode.
Upon success, the state ID returned is guaranteed to be valid for this automaton.
Errors
This returns an error when the given search configuration is not supported by the underlying automaton. For example, if the underlying automaton only supports unanchored searches but the given configuration was set to an anchored search, then this must return an error.
fn next_state(self: &Self, anchored: Anchored, sid: StateID, byte: u8) -> StateIDPerforms a state transition from
sidforbyteand returns the next state.anchoredshould beAnchored::Yeswhen executing an anchored search andAnchored::Nootherwise. For some implementations ofAutomaton, it is required to know whether the search is anchored or not in order to avoid following failure transitions. Other implementations may ignoreanchoredaltogether and depend onAutomaton::start_statereturning a state that walks a different path through the automaton depending on whether the search is anchored or not.Panics
This routine may panic or return incorrect results when the given state ID is invalid. A state ID is valid if and only if:
- It came from a call to
Automaton::start_state, or - It came from a previous call to
Automaton::next_statewith a valid state ID.
Implementations must treat all possible values of
byteas valid.Implementations may panic on unsupported values of
anchored, but are not required to do so.- It came from a call to
fn is_special(self: &Self, sid: StateID) -> boolReturns true if the given ID represents a "special" state. A special state is a dead, match or start state.
Note that implementations may choose to return false when the given ID corresponds to a start state. Namely, it always correct to treat start states as non-special. Implementations must return true for states that are dead or contain matches.
This has unspecified behavior when given an invalid state ID.
fn is_dead(self: &Self, sid: StateID) -> boolReturns true if the given ID represents a dead state.
A dead state is a type of "sink" in a finite state machine. It corresponds to a state whose transitions all loop back to itself. That is, once entered, it can never be left. In practice, it serves as a sentinel indicating that the search should terminate.
This has unspecified behavior when given an invalid state ID.
fn is_match(self: &Self, sid: StateID) -> boolReturns true if the given ID represents a match state.
A match state is always associated with one or more pattern IDs that matched at the position in the haystack when the match state was entered. When a match state is entered, the match semantics dictate whether it should be returned immediately (for
MatchKind::Standard) or if the search should continue (forMatchKind::LeftmostFirstandMatchKind::LeftmostLongest) until a dead state is seen or the end of the haystack has been reached.This has unspecified behavior when given an invalid state ID.
fn is_start(self: &Self, sid: StateID) -> boolReturns true if the given ID represents a start state.
While it is never incorrect to ignore start states during a search (except for the start of the search of course), knowing whether one has entered a start state can be useful for certain classes of performance optimizations. For example, if one is in a start state, it may be legal to try to skip ahead and look for match candidates more quickly than would otherwise be accomplished by walking the automaton.
Implementations of
Automatonin this crate "unspecialize" start states when a prefilter is not active or enabled. In this case, it is possible forAutomaton::is_special(sid)to return false whileAutomaton::is_start(sid)returns true.This has unspecified behavior when given an invalid state ID.
fn match_kind(self: &Self) -> MatchKindReturns the match semantics that this automaton was built with.
fn match_len(self: &Self, sid: StateID) -> usizeReturns the total number of matches for the given state ID.
This has unspecified behavior if the given ID does not refer to a match state.
fn match_pattern(self: &Self, sid: StateID, index: usize) -> PatternIDReturns the pattern ID for the match state given by
sidat theindexgiven.Typically,
indexis only ever greater than0when implementing an overlapping search. Otherwise, it's likely that your search only cares about reporting the first pattern ID in a match state.This has unspecified behavior if the given ID does not refer to a match state, or if the index is greater than or equal to the total number of matches in this match state.
fn patterns_len(self: &Self) -> usizeReturns the total number of patterns compiled into this automaton.
fn pattern_len(self: &Self, pid: PatternID) -> usizeReturns the length of the pattern for the given ID.
This has unspecified behavior when given an invalid pattern ID. A pattern ID is valid if and only if it is less than
Automaton::patterns_len.fn min_pattern_len(self: &Self) -> usizeReturns the length, in bytes, of the shortest pattern in this automaton.
fn max_pattern_len(self: &Self) -> usizeReturns the length, in bytes, of the longest pattern in this automaton.
fn memory_usage(self: &Self) -> usizeReturns the heap memory usage, in bytes, used by this automaton.
fn prefilter(self: &Self) -> Option<&Prefilter>Returns a prefilter, if available, that can be used to accelerate searches for this automaton.
The typical way this is used is when the start state is entered during a search. When that happens, one can use a prefilter to skip ahead and look for candidate matches without having to walk the automaton on the bytes between candidates.
Typically a prefilter is only available when there are a small (<100) number of patterns built into the automaton.
Provided Methods
fn try_find(self: &Self, input: &Input<'_>) -> Result<Option<Match>, MatchError>Executes a non-overlapping search with this automaton using the given configuration.
See
AhoCorasick::try_findfor more documentation and examples.fn try_find_overlapping(self: &Self, input: &Input<'_>, state: &mut OverlappingState) -> Result<(), MatchError>Executes a overlapping search with this automaton using the given configuration.
See
AhoCorasick::try_find_overlappingfor more documentation and examples.fn try_find_iter<'a, 'h>(self: &'a Self, input: Input<'h>) -> Result<FindIter<'a, 'h, Self>, MatchError> where Self: SizedReturns an iterator of non-overlapping matches with this automaton using the given configuration.
See
AhoCorasick::try_find_iterfor more documentation and examples.fn try_find_overlapping_iter<'a, 'h>(self: &'a Self, input: Input<'h>) -> Result<FindOverlappingIter<'a, 'h, Self>, MatchError> where Self: SizedReturns an iterator of overlapping matches with this automaton using the given configuration.
See
AhoCorasick::try_find_overlapping_iterfor more documentation and examples.fn try_replace_all<B>(self: &Self, haystack: &str, replace_with: &[B]) -> Result<String, MatchError> where Self: Sized, B: AsRef<str>Replaces all non-overlapping matches in
haystackwith strings fromreplace_withdepending on the pattern that matched. Thereplace_withslice must have length equal toAutomaton::patterns_len.See
AhoCorasick::try_replace_allfor more documentation and examples.fn try_replace_all_bytes<B>(self: &Self, haystack: &[u8], replace_with: &[B]) -> Result<Vec<u8>, MatchError> where Self: Sized, B: AsRef<[u8]>Replaces all non-overlapping matches in
haystackwith strings fromreplace_withdepending on the pattern that matched. Thereplace_withslice must have length equal toAutomaton::patterns_len.See
AhoCorasick::try_replace_all_bytesfor more documentation and examples.fn try_replace_all_with<F>(self: &Self, haystack: &str, dst: &mut String, replace_with: F) -> Result<(), MatchError> where Self: Sized, F: FnMut(&Match, &str, &mut String) -> boolReplaces all non-overlapping matches in
haystackby calling thereplace_withclosure given.See
AhoCorasick::try_replace_all_withfor more documentation and examples.fn try_replace_all_with_bytes<F>(self: &Self, haystack: &[u8], dst: &mut Vec<u8>, replace_with: F) -> Result<(), MatchError> where Self: Sized, F: FnMut(&Match, &[u8], &mut Vec<u8>) -> boolReplaces all non-overlapping matches in
haystackby calling thereplace_withclosure given.See
AhoCorasick::try_replace_all_with_bytesfor more documentation and examples.fn try_stream_find_iter<'a, R: std::io::Read>(self: &'a Self, rdr: R) -> Result<StreamFindIter<'a, Self, R>, MatchError> where Self: SizedReturns an iterator of non-overlapping matches with this automaton from the stream given.
See
AhoCorasick::try_stream_find_iterfor more documentation and examples.fn try_stream_replace_all<R, W, B>(self: &Self, rdr: R, wtr: W, replace_with: &[B]) -> Result<()> where Self: Sized, R: Read, W: Write, B: AsRef<[u8]>Replaces all non-overlapping matches in
rdrwith strings fromreplace_withdepending on the pattern that matched, and writes the result towtr. Thereplace_withslice must have length equal toAutomaton::patterns_len.See
AhoCorasick::try_stream_replace_allfor more documentation and examples.fn try_stream_replace_all_with<R, W, F>(self: &Self, rdr: R, wtr: W, replace_with: F) -> Result<()> where Self: Sized, R: Read, W: Write, F: FnMut(&Match, &[u8], &mut W) -> Result<()>Replaces all non-overlapping matches in
rdrby calling thereplace_withclosure given and writing the result towtr.See
AhoCorasick::try_stream_replace_all_withfor more documentation and examples.