Struct AhoCorasick
struct AhoCorasick { ... }
An automaton for searching multiple strings in linear time.
The AhoCorasick type supports a few basic ways of constructing an
automaton, with the default being AhoCorasick::new. However, there
are a fair number of configurable options that can be set by using
AhoCorasickBuilder instead. Such options include, but are not limited
to, how matches are determined, simple case insensitivity, whether to use a
DFA or not and various knobs for controlling the space-vs-time trade offs
taken when building the automaton.
Resource usage
Aho-Corasick automatons are always constructed in O(p) time, where
p is the combined length of all patterns being searched. With that
said, building an automaton can be fairly costly because of high constant
factors, particularly when enabling the DFA option
with AhoCorasickBuilder::kind. For this reason, it's generally a good
idea to build an automaton once and reuse it as much as possible.
Aho-Corasick automatons can also use a fair bit of memory. To get
a concrete idea of how much memory is being used, try using the
AhoCorasick::memory_usage method.
To give a quick idea of the differences between Aho-Corasick implementations and their resource usage, here's a sample of construction times and heap memory used after building an automaton from 100,000 randomly selected titles from Wikipedia:
- 99MB for a
noncontiguous::NFAin 240ms. - 21MB for a
contiguous::NFAin 275ms. - 1.6GB for a
dfa::DFAin 1.88s.
(Note that the memory usage above reflects the size of each automaton and not peak memory usage. For example, building a contiguous NFA requires first building a noncontiguous NFA. Once the contiguous NFA is built, the noncontiguous NFA is freed.)
This experiment very strongly argues that a contiguous NFA is often the
best balance in terms of resource usage. It takes a little longer to build,
but its memory usage is quite small. Its search speed (not listed) is
also often faster than a noncontiguous NFA, but a little slower than a
DFA. Indeed, when no specific AhoCorasickKind is used (which is the
default), a contiguous NFA is used in most cases.
The only "catch" to using a contiguous NFA is that, because of its variety
of compression tricks, it may not be able to support automatons as large as
what the noncontiguous NFA supports. In which case, building a contiguous
NFA will fail and (by default) AhoCorasick will automatically fall
back to a noncontiguous NFA. (This typically only happens when building
automatons from millions of patterns.) Otherwise, the small additional time
for building a contiguous NFA is almost certainly worth it.
Cloning
The AhoCorasick type uses thread safe reference counting internally. It
is guaranteed that it is cheap to clone.
Search configuration
Most of the search routines accept anything that can be cheaply converted
to an Input. This includes &[u8], &str and Input itself.
Construction failure
It is generally possible for building an Aho-Corasick automaton to fail. Construction can fail in generally one way: when the inputs provided are too big. Whether that's a pattern that is too long, too many patterns or some combination of both. A first approximation for the scale at which construction can fail is somewhere around "millions of patterns."
For that reason, if you're building an Aho-Corasick automaton from untrusted input (or input that doesn't have any reasonable bounds on its size), then it is strongly recommended to handle the possibility of an error.
If you're constructing an Aho-Corasick automaton from static or trusted
data, then it is likely acceptable to panic (by calling unwrap() or
expect()) if construction fails.
Fallibility
The AhoCorasick type provides a number of methods for searching, as one
might expect. Depending on how the Aho-Corasick automaton was built and
depending on the search configuration, it is possible for a search to
return an error. Since an error is never dependent on the actual contents
of the haystack, this type provides both infallible and fallible methods
for searching. The infallible methods panic if an error occurs, and can be
used for convenience and when you know the search will never return an
error.
For example, the AhoCorasick::find_iter method is the infallible
version of the AhoCorasick::try_find_iter method.
Examples of errors that can occur:
- Running a search that requires
MatchKind::Standardsemantics (such as a stream or overlapping search) with an automaton that was built withMatchKind::LeftmostFirstorMatchKind::LeftmostLongestsemantics. - Running an anchored search with an automaton that only supports
unanchored searches. (By default,
AhoCorasickonly supports unanchored searches. But this can be toggled withAhoCorasickBuilder::start_kind.) - Running an unanchored search with an automaton that only supports anchored searches.
The common thread between the different types of errors is that they are all rooted in the automaton construction and search configurations. If those configurations are a static property of your program, then it is reasonable to call infallible routines since you know an error will never occur. And if one does occur, then it's a bug in your program.
To re-iterate, if the patterns, build or search configuration come from
user or untrusted data, then you should handle errors at build or search
time. If only the haystack comes from user or untrusted data, then there
should be no need to handle errors anywhere and it is generally encouraged
to unwrap() (or expect()) both build and search time calls.
Examples
This example shows how to search for occurrences of multiple patterns simultaneously in a case insensitive fashion. Each match includes the pattern that matched along with the byte offsets of the match.
use ;
let patterns = &;
let haystack = "Nobody likes maple in their apple flavored Snapple.";
let ac = builder
.ascii_case_insensitive
.build
.unwrap;
let mut matches = vec!;
for mat in ac.find_iter
assert_eq!;
This example shows how to replace matches with some other string:
use AhoCorasick;
let patterns = &;
let haystack = "The quick brown fox.";
let replace_with = &;
let ac = new.unwrap;
let result = ac.replace_all;
assert_eq!;
Implementations
impl AhoCorasick
fn new<I, P>(patterns: I) -> Result<AhoCorasick, BuildError> where I: IntoIterator<Item = P>, P: AsRef<[u8]>Create a new Aho-Corasick automaton using the default configuration.
The default configuration optimizes for less space usage, but at the expense of longer search times. To change the configuration, use
AhoCorasickBuilder.This uses the default
MatchKind::Standardmatch semantics, which reports a match as soon as it is found. This corresponds to the standard match semantics supported by textbook descriptions of the Aho-Corasick algorithm.Examples
Basic usage:
use ; let ac = new.unwrap; assert_eq!;fn builder() -> AhoCorasickBuilderA convenience method for returning a new Aho-Corasick builder.
This usually permits one to just import the
AhoCorasicktype.Examples
Basic usage:
use ; let ac = builder .match_kind .build .unwrap; assert_eq!;
impl AhoCorasick
fn kind(self: &Self) -> AhoCorasickKindReturns the kind of the Aho-Corasick automaton used by this searcher.
Knowing the Aho-Corasick kind is principally useful for diagnostic purposes. In particular, if no specific kind was given to
AhoCorasickBuilder::kind, then one is automatically chosen and this routine will report which one.Note that the heuristics used for choosing which
AhoCorasickKindmay be changed in a semver compatible release.Examples
use ; let ac = new.unwrap; // The specific Aho-Corasick kind chosen is not guaranteed! assert_eq!;fn start_kind(self: &Self) -> StartKindReturns the type of starting search configuration supported by this Aho-Corasick automaton.
Examples
use ; let ac = new.unwrap; assert_eq!;fn match_kind(self: &Self) -> MatchKindReturns the match kind used by this automaton.
The match kind is important because it determines what kinds of matches are returned. Also, some operations (such as overlapping search and stream searching) are only supported when using the
MatchKind::Standardmatch kind.Examples
use ; let ac = new.unwrap; assert_eq!;fn min_pattern_len(self: &Self) -> usizeReturns the length of the shortest pattern matched by this automaton.
Examples
Basic usage:
use AhoCorasick; let ac = new.unwrap; assert_eq!;Note that an
AhoCorasickautomaton has a minimum length of0if and only if it can match the empty string:use AhoCorasick; let ac = new.unwrap; assert_eq!;fn max_pattern_len(self: &Self) -> usizeReturns the length of the longest pattern matched by this automaton.
Examples
Basic usage:
use AhoCorasick; let ac = new.unwrap; assert_eq!;fn patterns_len(self: &Self) -> usizeReturn the total number of patterns matched by this automaton.
This includes patterns that may never participate in a match. For example, if
MatchKind::LeftmostFirstmatch semantics are used, and the patternsSamandSamwisewere used to build the automaton (in that order), thenSamwisecan never participate in a match becauseSamwill always take priority.Examples
Basic usage:
use AhoCorasick; let ac = new.unwrap; assert_eq!;fn memory_usage(self: &Self) -> usizeReturns the approximate total amount of heap used by this automaton, in units of bytes.
Examples
This example shows the difference in heap usage between a few configurations:
# if !cfg! use ; let ac = builder .kind // default .build .unwrap; assert_eq!; let ac = builder .kind // default .ascii_case_insensitive .build .unwrap; assert_eq!; let ac = builder .kind .ascii_case_insensitive .build .unwrap; assert_eq!; let ac = builder .kind .ascii_case_insensitive .build .unwrap; assert_eq!; let ac = builder .kind .ascii_case_insensitive .build .unwrap; // While this shows the DFA being the biggest here by a small margin, // don't let the difference fool you. With such a small number of // patterns, the difference is small, but a bigger number of patterns // will reveal that the rate of growth of the DFA is far bigger than // the NFAs above. For a large number of patterns, it is easy for the // DFA to take an order of magnitude more heap space (or more!). assert_eq!;
impl AhoCorasick
fn try_find<'h, I: Into<Input<'h>>>(self: &Self, input: I) -> Result<Option<Match>, MatchError>Returns the location of the first match according to the match semantics that this automaton was constructed with, and according to the given
Inputconfiguration.This is the fallible version of
AhoCorasick::find.Errors
This returns an error when this Aho-Corasick searcher does not support the given
Inputconfiguration.For example, if the Aho-Corasick searcher only supports anchored searches or only supports unanchored searches, then providing an
Inputthat requests an anchored (or unanchored) search when it isn't supported would result in an error.Example: leftmost-first searching
Basic usage with leftmost-first semantics:
use ; let patterns = &; let haystack = "foo abcd"; let ac = builder .match_kind .build .unwrap; let mat = ac.try_find?.expect; assert_eq!; # Ok::Example: anchored leftmost-first searching
This shows how to anchor the search, so that even if the haystack contains a match somewhere, a match won't be reported unless one can be found that starts at the beginning of the search:
use ; let patterns = &; let haystack = "foo abcd"; let ac = builder .match_kind .start_kind .build .unwrap; let input = new.anchored; assert_eq!; # Ok::If the beginning of the search is changed to where a match begins, then it will be found:
use ; let patterns = &; let haystack = "foo abcd"; let ac = builder .match_kind .start_kind .build .unwrap; let input = new.range.anchored; let mat = ac.try_find?.expect; assert_eq!; # Ok::Example: earliest leftmost-first searching
This shows how to run an "earliest" search even when the Aho-Corasick searcher was compiled with leftmost-first match semantics. In this case, the search is stopped as soon as it is known that a match has occurred, even if it doesn't correspond to the leftmost-first match.
use ; let patterns = &; let haystack = "foo abcd"; let ac = builder .match_kind .build .unwrap; let input = new.earliest; let mat = ac.try_find?.expect; assert_eq!; # Ok::fn try_find_overlapping<'h, I: Into<Input<'h>>>(self: &Self, input: I, state: &mut OverlappingState) -> Result<(), MatchError>Returns the location of the first overlapping match in the given input with respect to the current state of the underlying searcher.
Overlapping searches do not report matches in their return value. Instead, matches can be accessed via
OverlappingState::get_matchafter a search call.This is the fallible version of
AhoCorasick::find_overlapping.Errors
This returns an error when this Aho-Corasick searcher does not support the given
Inputconfiguration or if overlapping search is not supported.One example is that only Aho-Corasicker searchers built with
MatchKind::Standardsemantics support overlapping searches. Using any other match semantics will result in this returning an error.Example: basic usage
This shows how we can repeatedly call an overlapping search without ever needing to explicitly re-slice the haystack. Overlapping search works this way because searches depend on state saved during the previous search.
use ; let patterns = &; let haystack = "append the app to the appendage"; let ac = new.unwrap; let mut state = start; ac.try_find_overlapping?; assert_eq!; ac.try_find_overlapping?; assert_eq!; ac.try_find_overlapping?; assert_eq!; ac.try_find_overlapping?; assert_eq!; ac.try_find_overlapping?; assert_eq!; ac.try_find_overlapping?; assert_eq!; // No more match matches to be found. ac.try_find_overlapping?; assert_eq!; # Ok::Example: implementing your own overlapping iteration
The previous example can be easily adapted to implement your own iteration by repeatedly calling
try_find_overlappinguntil either an error occurs or no more matches are reported.This is effectively equivalent to the iterator returned by
AhoCorasick::try_find_overlapping_iter, with the only difference being that the iterator checks for errors before construction and absolves the caller of needing to check for errors on every search call. (Indeed, if the firsttry_find_overlappingcall succeeds and the sameInputis given to subsequent calls, then all subsequent calls are guaranteed to succeed.)use ; let patterns = &; let haystack = "append the app to the appendage"; let ac = new.unwrap; let mut state = start; let mut matches = vec!; loop let expected = vec!; assert_eq!; # Ok::Example: anchored iteration
The previous example can also be adapted to implement iteration over all anchored matches. In particular,
AhoCorasick::try_find_overlapping_iterdoes not support this because it isn't totally clear what the match semantics ought to be.In this example, we will find all overlapping matches that start at the beginning of our search.
use ; let patterns = &; let haystack = "append the app to the appendage"; let ac = builder .start_kind .build .unwrap; let input = new.anchored; let mut state = start; let mut matches = vec!; loop let expected = vec!; assert_eq!; # Ok::fn try_find_iter<'a, 'h, I: Into<Input<'h>>>(self: &'a Self, input: I) -> Result<FindIter<'a, 'h>, MatchError>Returns an iterator of non-overlapping matches, using the match semantics that this automaton was constructed with.
This is the fallible version of
AhoCorasick::find_iter.Note that the error returned by this method occurs during construction of the iterator. The iterator itself yields
Matchvalues. That is, once the iterator is constructed, the iteration itself will never report an error.Errors
This returns an error when this Aho-Corasick searcher does not support the given
Inputconfiguration.For example, if the Aho-Corasick searcher only supports anchored searches or only supports unanchored searches, then providing an
Inputthat requests an anchored (or unanchored) search when it isn't supported would result in an error.Example: leftmost-first searching
Basic usage with leftmost-first semantics:
use ; let patterns = &; let haystack = "append the app to the appendage"; let ac = builder .match_kind .build .unwrap; let matches: = ac .try_find_iter? .map .collect; assert_eq!; # Ok::Example: anchored leftmost-first searching
This shows how to anchor the search, such that all matches must begin at the starting location of the search. For an iterator, an anchored search implies that all matches are adjacent.
use ; let patterns = &; let haystack = "fooquuxbar foo"; let ac = builder .match_kind .start_kind .build .unwrap; let matches: = ac .try_find_iter? .map .collect; assert_eq!; # Ok::fn try_find_overlapping_iter<'a, 'h, I: Into<Input<'h>>>(self: &'a Self, input: I) -> Result<FindOverlappingIter<'a, 'h>, MatchError>Returns an iterator of overlapping matches.
This is the fallible version of
AhoCorasick::find_overlapping_iter.Note that the error returned by this method occurs during construction of the iterator. The iterator itself yields
Matchvalues. That is, once the iterator is constructed, the iteration itself will never report an error.Errors
This returns an error when this Aho-Corasick searcher does not support the given
Inputconfiguration or does not support overlapping searches.One example is that only Aho-Corasicker searchers built with
MatchKind::Standardsemantics support overlapping searches. Using any other match semantics will result in this returning an error.Example: basic usage
use ; let patterns = &; let haystack = "append the app to the appendage"; let ac = new.unwrap; let matches: = ac .try_find_overlapping_iter? .map .collect; assert_eq!; # Ok::Example: anchored overlapping search returns an error
It isn't clear what the match semantics for anchored overlapping iterators ought to be, so currently an error is returned. Callers may use
AhoCorasick::try_find_overlappingto implement their own semantics if desired.use ; let patterns = &; let haystack = "appendappendage app"; let ac = builder .start_kind .build .unwrap; let input = new.anchored; assert!; # Ok::fn try_replace_all<B>(self: &Self, haystack: &str, replace_with: &[B]) -> Result<String, MatchError> where B: AsRef<str>Replace all matches with a corresponding value in the
replace_withslice given. Matches correspond to the same matches as reported byAhoCorasick::try_find_iter.Replacements are determined by the index of the matching pattern. For example, if the pattern with index
2is found, then it is replaced byreplace_with[2].Panics
This panics when
replace_with.len()does not equalAhoCorasick::patterns_len.Errors
This returns an error when this Aho-Corasick searcher does not support the default
Inputconfiguration. More specifically, this occurs only when the Aho-Corasick searcher does not support unanchored searches since this replacement routine always does an unanchored search.Example: basic usage
use ; let patterns = &; let haystack = "append the app to the appendage"; let ac = builder .match_kind .build .unwrap; let result = ac.try_replace_all?; assert_eq!; # Ok::fn try_replace_all_bytes<B>(self: &Self, haystack: &[u8], replace_with: &[B]) -> Result<Vec<u8>, MatchError> where B: AsRef<[u8]>Replace all matches using raw bytes with a corresponding value in the
replace_withslice given. Matches correspond to the same matches as reported byAhoCorasick::try_find_iter.Replacements are determined by the index of the matching pattern. For example, if the pattern with index
2is found, then it is replaced byreplace_with[2].This is the fallible version of
AhoCorasick::replace_all_bytes.Panics
This panics when
replace_with.len()does not equalAhoCorasick::patterns_len.Errors
This returns an error when this Aho-Corasick searcher does not support the default
Inputconfiguration. More specifically, this occurs only when the Aho-Corasick searcher does not support unanchored searches since this replacement routine always does an unanchored search.Example: basic usage
use ; let patterns = &; let haystack = b"append the app to the appendage"; let ac = builder .match_kind .build .unwrap; let result = ac.try_replace_all_bytes?; assert_eq!; # Ok::fn try_replace_all_with<F>(self: &Self, haystack: &str, dst: &mut String, replace_with: F) -> Result<(), MatchError> where F: FnMut(&Match, &str, &mut String) -> boolReplace all matches using a closure called on each match. Matches correspond to the same matches as reported by
AhoCorasick::try_find_iter.The closure accepts three parameters: the match found, the text of the match and a string buffer with which to write the replaced text (if any). If the closure returns
true, then it continues to the next match. If the closure returnsfalse, then searching is stopped.Note that any matches with boundaries that don't fall on a valid UTF-8 boundary are silently skipped.
This is the fallible version of
AhoCorasick::replace_all_with.Errors
This returns an error when this Aho-Corasick searcher does not support the default
Inputconfiguration. More specifically, this occurs only when the Aho-Corasick searcher does not support unanchored searches since this replacement routine always does an unanchored search.Examples
Basic usage:
use ; let patterns = &; let haystack = "append the app to the appendage"; let ac = builder .match_kind .build .unwrap; let mut result = Stringnew; ac.try_replace_all_with?; assert_eq!; # Ok::Stopping the replacement by returning
false(continued from the example above):# use ; # let patterns = &; # let haystack = "append the app to the appendage"; # let ac = builder # .match_kind # .build # .unwrap; let mut result = Stringnew; ac.try_replace_all_with?; assert_eq!; # Ok::fn try_replace_all_with_bytes<F>(self: &Self, haystack: &[u8], dst: &mut Vec<u8>, replace_with: F) -> Result<(), MatchError> where F: FnMut(&Match, &[u8], &mut Vec<u8>) -> boolReplace all matches using raw bytes with a closure called on each match. Matches correspond to the same matches as reported by
AhoCorasick::try_find_iter.The closure accepts three parameters: the match found, the text of the match and a byte buffer with which to write the replaced text (if any). If the closure returns
true, then it continues to the next match. If the closure returnsfalse, then searching is stopped.This is the fallible version of
AhoCorasick::replace_all_with_bytes.Errors
This returns an error when this Aho-Corasick searcher does not support the default
Inputconfiguration. More specifically, this occurs only when the Aho-Corasick searcher does not support unanchored searches since this replacement routine always does an unanchored search.Examples
Basic usage:
use ; let patterns = &; let haystack = b"append the app to the appendage"; let ac = builder .match_kind .build .unwrap; let mut result = vec!; ac.try_replace_all_with_bytes?; assert_eq!; # Ok::Stopping the replacement by returning
false(continued from the example above):# use ; # let patterns = &; # let haystack = b"append the app to the appendage"; # let ac = builder # .match_kind # .build # .unwrap; let mut result = vec!; ac.try_replace_all_with_bytes?; assert_eq!; # Ok::fn try_stream_find_iter<'a, R: std::io::Read>(self: &'a Self, rdr: R) -> Result<StreamFindIter<'a, R>, MatchError>Returns an iterator of non-overlapping matches in the given stream. Matches correspond to the same matches as reported by
AhoCorasick::try_find_iter.The matches yielded by this iterator use absolute position offsets in the stream given, where the first byte has index
0. Matches are yieled until the stream is exhausted.Each item yielded by the iterator is an
Result<Match, std::io::Error>, where an error is yielded if there was a problem reading from the reader given.When searching a stream, an internal buffer is used. Therefore, callers should avoiding providing a buffered reader, if possible.
This is the fallible version of
AhoCorasick::stream_find_iter. Note that both methods return iterators that produceResultvalues. The difference is that this routine returns an error if construction of the iterator failed. TheResultvalues yield by the iterator come from whether the given reader returns an error or not during the search.Memory usage
In general, searching streams will use a constant amount of memory for its internal buffer. The one requirement is that the internal buffer must be at least the size of the longest possible match. In most use cases, the default buffer size will be much larger than any individual match.
Errors
This returns an error when this Aho-Corasick searcher does not support the default
Inputconfiguration. More specifically, this occurs only when the Aho-Corasick searcher does not support unanchored searches since this stream searching routine always does an unanchored search.This also returns an error if the searcher does not support stream searches. Only searchers built with
MatchKind::Standardsemantics support stream searches.Example: basic usage
use ; let patterns = &; let haystack = "append the app to the appendage"; let ac = new.unwrap; let mut matches = vec!; for result in ac.try_stream_find_iter? assert_eq!; # Ok::fn try_stream_replace_all<R, W, B>(self: &Self, rdr: R, wtr: W, replace_with: &[B]) -> Result<(), Error> where R: Read, W: Write, B: AsRef<[u8]>Search for and replace all matches of this automaton in the given reader, and write the replacements to the given writer. Matches correspond to the same matches as reported by
AhoCorasick::try_find_iter.Replacements are determined by the index of the matching pattern. For example, if the pattern with index
2is found, then it is replaced byreplace_with[2].After all matches are replaced, the writer is not flushed.
If there was a problem reading from the given reader or writing to the given writer, then the corresponding
io::Erroris returned and all replacement is stopped.When searching a stream, an internal buffer is used. Therefore, callers should avoiding providing a buffered reader, if possible. However, callers may want to provide a buffered writer.
Note that there is currently no infallible version of this routine.
Memory usage
In general, searching streams will use a constant amount of memory for its internal buffer. The one requirement is that the internal buffer must be at least the size of the longest possible match. In most use cases, the default buffer size will be much larger than any individual match.
Panics
This panics when
replace_with.len()does not equalAhoCorasick::patterns_len.Errors
This returns an error when this Aho-Corasick searcher does not support the default
Inputconfiguration. More specifically, this occurs only when the Aho-Corasick searcher does not support unanchored searches since this stream searching routine always does an unanchored search.This also returns an error if the searcher does not support stream searches. Only searchers built with
MatchKind::Standardsemantics support stream searches.Example: basic usage
use AhoCorasick; let patterns = &; let haystack = "The quick brown fox."; let replace_with = &; let ac = new.unwrap; let mut result = vec!; ac.try_stream_replace_all?; assert_eq!; # Ok::fn try_stream_replace_all_with<R, W, F>(self: &Self, rdr: R, wtr: W, replace_with: F) -> Result<(), Error> where R: Read, W: Write, F: FnMut(&Match, &[u8], &mut W) -> Result<(), Error>Search the given reader and replace all matches of this automaton using the given closure. The result is written to the given writer. Matches correspond to the same matches as reported by
AhoCorasick::try_find_iter.The closure accepts three parameters: the match found, the text of the match and the writer with which to write the replaced text (if any).
After all matches are replaced, the writer is not flushed.
If there was a problem reading from the given reader or writing to the given writer, then the corresponding
io::Erroris returned and all replacement is stopped.When searching a stream, an internal buffer is used. Therefore, callers should avoiding providing a buffered reader, if possible. However, callers may want to provide a buffered writer.
Note that there is currently no infallible version of this routine.
Memory usage
In general, searching streams will use a constant amount of memory for its internal buffer. The one requirement is that the internal buffer must be at least the size of the longest possible match. In most use cases, the default buffer size will be much larger than any individual match.
Errors
This returns an error when this Aho-Corasick searcher does not support the default
Inputconfiguration. More specifically, this occurs only when the Aho-Corasick searcher does not support unanchored searches since this stream searching routine always does an unanchored search.This also returns an error if the searcher does not support stream searches. Only searchers built with
MatchKind::Standardsemantics support stream searches.Example: basic usage
use Write; use AhoCorasick; let patterns = &; let haystack = "The quick brown fox."; let ac = new.unwrap; let mut result = vec!; ac.try_stream_replace_all_with?; assert_eq!; # Ok::
impl AhoCorasick
fn is_match<'h, I: Into<Input<'h>>>(self: &Self, input: I) -> boolReturns true if and only if this automaton matches the haystack at any position.
inputmay be any type that is cheaply convertible to anInput. This includes, but is not limited to,&strand&[u8].Aside from convenience, when
AhoCorasickwas built with leftmost-first or leftmost-longest semantics, this might result in a search that visits less of the haystack thanAhoCorasick::findwould otherwise. (For standard semantics, matches are always immediately returned once they are seen, so there is no way for this to do less work in that case.)Note that there is no corresponding fallible routine for this method. If you need a fallible version of this, then
AhoCorasick::try_findcan be used withInput::earliestenabled.Examples
Basic usage:
use AhoCorasick; let ac = new.unwrap; assert!; assert!;fn find<'h, I: Into<Input<'h>>>(self: &Self, input: I) -> Option<Match>Returns the location of the first match according to the match semantics that this automaton was constructed with.
inputmay be any type that is cheaply convertible to anInput. This includes, but is not limited to,&strand&[u8].This is the infallible version of
AhoCorasick::try_find.Panics
This panics when
AhoCorasick::try_findwould return an error.Examples
Basic usage, with standard semantics:
use ; let patterns = &; let haystack = "abcd"; let ac = builder .match_kind // default, not necessary .build .unwrap; let mat = ac.find.expect; assert_eq!;Now with leftmost-first semantics:
use ; let patterns = &; let haystack = "abcd"; let ac = builder .match_kind .build .unwrap; let mat = ac.find.expect; assert_eq!;And finally, leftmost-longest semantics:
use ; let patterns = &; let haystack = "abcd"; let ac = builder .match_kind .build .unwrap; let mat = ac.find.expect;Example: configuring a search
Because this method accepts anything that can be turned into an
Input, it's possible to provide anInputdirectly in order to configure the search. In this example, we show how to use theearliestoption to force the search to return as soon as it knows a match has occurred.use ; let patterns = &; let haystack = "abcd"; let ac = builder .match_kind .build .unwrap; let mat = ac.find .expect; // The correct leftmost-longest match here is 'abcd', but since we // told the search to quit as soon as it knows a match has occurred, // we get a different match back. assert_eq!;fn find_overlapping<'h, I: Into<Input<'h>>>(self: &Self, input: I, state: &mut OverlappingState)Returns the location of the first overlapping match in the given input with respect to the current state of the underlying searcher.
inputmay be any type that is cheaply convertible to anInput. This includes, but is not limited to,&strand&[u8].Overlapping searches do not report matches in their return value. Instead, matches can be accessed via
OverlappingState::get_matchafter a search call.This is the infallible version of
AhoCorasick::try_find_overlapping.Panics
This panics when
AhoCorasick::try_find_overlappingwould return an error. For example, when the Aho-Corasick searcher doesn't support overlapping searches. (Only searchers built withMatchKind::Standardsemantics support overlapping searches.)Example
This shows how we can repeatedly call an overlapping search without ever needing to explicitly re-slice the haystack. Overlapping search works this way because searches depend on state saved during the previous search.
use ; let patterns = &; let haystack = "append the app to the appendage"; let ac = new.unwrap; let mut state = start; ac.find_overlapping; assert_eq!; ac.find_overlapping; assert_eq!; ac.find_overlapping; assert_eq!; ac.find_overlapping; assert_eq!; ac.find_overlapping; assert_eq!; ac.find_overlapping; assert_eq!; // No more match matches to be found. ac.find_overlapping; assert_eq!;fn find_iter<'a, 'h, I: Into<Input<'h>>>(self: &'a Self, input: I) -> FindIter<'a, 'h>Returns an iterator of non-overlapping matches, using the match semantics that this automaton was constructed with.
inputmay be any type that is cheaply convertible to anInput. This includes, but is not limited to,&strand&[u8].This is the infallible version of
AhoCorasick::try_find_iter.Panics
This panics when
AhoCorasick::try_find_iterwould return an error.Examples
Basic usage, with standard semantics:
use ; let patterns = &; let haystack = "append the app to the appendage"; let ac = builder .match_kind // default, not necessary .build .unwrap; let matches: = ac .find_iter .map .collect; assert_eq!;Now with leftmost-first semantics:
use ; let patterns = &; let haystack = "append the app to the appendage"; let ac = builder .match_kind .build .unwrap; let matches: = ac .find_iter .map .collect; assert_eq!;And finally, leftmost-longest semantics:
use ; let patterns = &; let haystack = "append the app to the appendage"; let ac = builder .match_kind .build .unwrap; let matches: = ac .find_iter .map .collect; assert_eq!;fn find_overlapping_iter<'a, 'h, I: Into<Input<'h>>>(self: &'a Self, input: I) -> FindOverlappingIter<'a, 'h>Returns an iterator of overlapping matches. Stated differently, this returns an iterator of all possible matches at every position.
inputmay be any type that is cheaply convertible to anInput. This includes, but is not limited to,&strand&[u8].This is the infallible version of
AhoCorasick::try_find_overlapping_iter.Panics
This panics when
AhoCorasick::try_find_overlapping_iterwould return an error. For example, when the Aho-Corasick searcher is built with either leftmost-first or leftmost-longest match semantics. Stated differently, overlapping searches require one to build the searcher withMatchKind::Standard(it is the default).Example: basic usage
use ; let patterns = &; let haystack = "append the app to the appendage"; let ac = new.unwrap; let matches: = ac .find_overlapping_iter .map .collect; assert_eq!;fn replace_all<B>(self: &Self, haystack: &str, replace_with: &[B]) -> String where B: AsRef<str>Replace all matches with a corresponding value in the
replace_withslice given. Matches correspond to the same matches as reported byAhoCorasick::find_iter.Replacements are determined by the index of the matching pattern. For example, if the pattern with index
2is found, then it is replaced byreplace_with[2].This is the infallible version of
AhoCorasick::try_replace_all.Panics
This panics when
AhoCorasick::try_replace_allwould return an error.This also panics when
replace_with.len()does not equalAhoCorasick::patterns_len.Example: basic usage
use ; let patterns = &; let haystack = "append the app to the appendage"; let ac = builder .match_kind .build .unwrap; let result = ac.replace_all; assert_eq!;fn replace_all_bytes<B>(self: &Self, haystack: &[u8], replace_with: &[B]) -> Vec<u8> where B: AsRef<[u8]>Replace all matches using raw bytes with a corresponding value in the
replace_withslice given. Matches correspond to the same matches as reported byAhoCorasick::find_iter.Replacements are determined by the index of the matching pattern. For example, if the pattern with index
2is found, then it is replaced byreplace_with[2].This is the infallible version of
AhoCorasick::try_replace_all_bytes.Panics
This panics when
AhoCorasick::try_replace_all_byteswould return an error.This also panics when
replace_with.len()does not equalAhoCorasick::patterns_len.Example: basic usage
use ; let patterns = &; let haystack = b"append the app to the appendage"; let ac = builder .match_kind .build .unwrap; let result = ac.replace_all_bytes; assert_eq!;fn replace_all_with<F>(self: &Self, haystack: &str, dst: &mut String, replace_with: F) where F: FnMut(&Match, &str, &mut String) -> boolReplace all matches using a closure called on each match. Matches correspond to the same matches as reported by
AhoCorasick::find_iter.The closure accepts three parameters: the match found, the text of the match and a string buffer with which to write the replaced text (if any). If the closure returns
true, then it continues to the next match. If the closure returnsfalse, then searching is stopped.Note that any matches with boundaries that don't fall on a valid UTF-8 boundary are silently skipped.
This is the infallible version of
AhoCorasick::try_replace_all_with.Panics
This panics when
AhoCorasick::try_replace_all_withwould return an error.Examples
Basic usage:
use ; let patterns = &; let haystack = "append the app to the appendage"; let ac = builder .match_kind .build .unwrap; let mut result = Stringnew; ac.replace_all_with; assert_eq!;Stopping the replacement by returning
false(continued from the example above):# use ; # let patterns = &; # let haystack = "append the app to the appendage"; # let ac = builder # .match_kind # .build # .unwrap; let mut result = Stringnew; ac.replace_all_with; assert_eq!;fn replace_all_with_bytes<F>(self: &Self, haystack: &[u8], dst: &mut Vec<u8>, replace_with: F) where F: FnMut(&Match, &[u8], &mut Vec<u8>) -> boolReplace all matches using raw bytes with a closure called on each match. Matches correspond to the same matches as reported by
AhoCorasick::find_iter.The closure accepts three parameters: the match found, the text of the match and a byte buffer with which to write the replaced text (if any). If the closure returns
true, then it continues to the next match. If the closure returnsfalse, then searching is stopped.This is the infallible version of
AhoCorasick::try_replace_all_with_bytes.Panics
This panics when
AhoCorasick::try_replace_all_with_byteswould return an error.Examples
Basic usage:
use ; let patterns = &; let haystack = b"append the app to the appendage"; let ac = builder .match_kind .build .unwrap; let mut result = vec!; ac.replace_all_with_bytes; assert_eq!;Stopping the replacement by returning
false(continued from the example above):# use ; # let patterns = &; # let haystack = b"append the app to the appendage"; # let ac = builder # .match_kind # .build # .unwrap; let mut result = vec!; ac.replace_all_with_bytes; assert_eq!;fn stream_find_iter<'a, R: std::io::Read>(self: &'a Self, rdr: R) -> StreamFindIter<'a, R>Returns an iterator of non-overlapping matches in the given stream. Matches correspond to the same matches as reported by
AhoCorasick::find_iter.The matches yielded by this iterator use absolute position offsets in the stream given, where the first byte has index
0. Matches are yieled until the stream is exhausted.Each item yielded by the iterator is an
Result<Match, std::io::Error>, where an error is yielded if there was a problem reading from the reader given.When searching a stream, an internal buffer is used. Therefore, callers should avoiding providing a buffered reader, if possible.
This is the infallible version of
AhoCorasick::try_stream_find_iter. Note that both methods return iterators that produceResultvalues. The difference is that this routine panics if construction of the iterator failed. TheResultvalues yield by the iterator come from whether the given reader returns an error or not during the search.Memory usage
In general, searching streams will use a constant amount of memory for its internal buffer. The one requirement is that the internal buffer must be at least the size of the longest possible match. In most use cases, the default buffer size will be much larger than any individual match.
Panics
This panics when
AhoCorasick::try_stream_find_iterwould return an error. For example, when the Aho-Corasick searcher doesn't support stream searches. (Only searchers built withMatchKind::Standardsemantics support stream searches.)Example: basic usage
use ; let patterns = &; let haystack = "append the app to the appendage"; let ac = new.unwrap; let mut matches = vec!; for result in ac.stream_find_iter assert_eq!; # Ok::
impl Clone for AhoCorasick
fn clone(self: &Self) -> AhoCorasick
impl Debug for AhoCorasick
fn fmt(self: &Self, f: &mut Formatter<'_>) -> Result
impl Freeze for AhoCorasick
impl RefUnwindSafe for AhoCorasick
impl Send for AhoCorasick
impl Sync for AhoCorasick
impl Unpin for AhoCorasick
impl UnsafeUnpin for AhoCorasick
impl UnwindSafe for AhoCorasick
impl<T> Any for AhoCorasick
fn type_id(self: &Self) -> TypeId
impl<T> Borrow for AhoCorasick
fn borrow(self: &Self) -> &T
impl<T> BorrowMut for AhoCorasick
fn borrow_mut(self: &mut Self) -> &mut T
impl<T> CloneToUninit for AhoCorasick
unsafe fn clone_to_uninit(self: &Self, dest: *mut u8)
impl<T> From for AhoCorasick
fn from(t: T) -> TReturns the argument unchanged.
impl<T> ToOwned for AhoCorasick
fn to_owned(self: &Self) -> Tfn clone_into(self: &Self, target: &mut T)
impl<T, U> Into for AhoCorasick
fn into(self: Self) -> UCalls
U::from(self).That is, this conversion is whatever the implementation of
[From]<T> for Uchooses to do.
impl<T, U> TryFrom for AhoCorasick
fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>
impl<T, U> TryInto for AhoCorasick
fn try_into(self: Self) -> Result<U, <U as TryFrom<T>>::Error>