Struct Searcher
struct Searcher<'h> { ... }
A searcher for creating iterators and performing lower level iteration.
This searcher encapsulates the logic required for finding all successive non-overlapping matches in a haystack. In theory, iteration would look something like this:
- Setting the start position to
0. - Execute a regex search. If no match, end iteration.
- Report the match and set the start position to the end of the match.
- Go back to (2).
And if this were indeed the case, it's likely that Searcher wouldn't
exist. Unfortunately, because a regex may match the empty string, the above
logic won't work for all possible regexes. Namely, if an empty match is
found, then step (3) would set the start position of the search to the
position it was at. Thus, iteration would never end.
Instead, a Searcher knows how to detect these cases and forcefully
advance iteration in the case of an empty match that overlaps with a
previous match.
If you know that your regex cannot match any empty string, then the simple algorithm described above will work correctly.
When possible, prefer the iterators defined on the regex engine you're using. This tries to abstract over the regex engine and is thus a bit more unwieldy to use.
In particular, a Searcher is not itself an iterator. Instead, it provides
advance routines that permit moving the search along explicitly. It also
provides various routines, like Searcher::into_matches_iter, that
accept a closure (representing how a regex engine executes a search) and
returns a conventional iterator.
The lifetime parameters come from the Input type passed to
[Searcher::new]:
'his the lifetime of the underlying haystack.
Searcher vs Iterator
Why does a search type with "advance" APIs exist at all when we also have iterators? Unfortunately, the reasoning behind this split is a complex combination of the following things:
- While many of the regex engines expose their own iterators, it is also
nice to expose this lower level iteration helper because it permits callers
to provide their own
Inputconfiguration. Moreover, aSearchercan work with any regex engine instead of only the ones defined in this crate. This way, everyone benefits from a shared iteration implementation. - There are many different regex engines that, while they have the same match semantics, they have slightly different APIs. Iteration is just complex enough to want to share code, and so we need a way of abstracting over those different regex engines. While we could define a new trait that describes any regex engine search API, it would wind up looking very close to a closure. While there may still be reasons for the more generic trait to exist, for now and for the purposes of iteration, we use a closure. Closures also provide a lot of easy flexibility at the call site, in that they permit the caller to borrow any kind of state they want for use during each search call.
- As a result of using closures, and because closures are anonymous types
that cannot be named, it is difficult to encapsulate them without both
costs to speed and added complexity to the public API. For example, in
defining an iterator type like
dfa::regex::FindMatches, if we use a closure internally, it's not possible to name this type in the return type of the iterator constructor. Thus, the only way around it is to erase the type by boxing it and turning it into aBox<dyn FnMut ...>. This boxed closure is unlikely to be inlined and it infects the public API in subtle ways. Namely, unless you declare the closure as implementingSendandSync, then the resulting iterator type won't implement it either. But there are practical issues with requiring the closure to implementSendandSyncthat result in other API complexities that are beyond the scope of this already long exposition. - Some regex engines expose more complex match information than just
"which pattern matched" and "at what offsets." For example, the PikeVM
exposes match spans for each capturing group that participated in the
match. In such cases, it can be quite beneficial to reuse the capturing
group allocation on subsequent searches. A proper iterator doesn't permit
this API due to its interface, so it's useful to have something a bit lower
level that permits callers to amortize allocations while also reusing a
shared implementation of iteration. (See the documentation for
Searcher::advancefor an example of using the "advance" API with the PikeVM.)
What this boils down to is that there are "advance" APIs which require handing a closure to it for every call, and there are also APIs to create iterators from a closure. The former are useful for implementing iterators or when you need more flexibility, while the latter are useful for conveniently writing custom iterators on-the-fly.
Example: iterating with captures
Several regex engines in this crate over convenient iterator APIs over
Captures values. To do so, this requires allocating a new Captures
value for each iteration step. This can perhaps be more costly than you
might want. Instead of implementing your own iterator to avoid that
cost (which can be a little subtle if you want to handle empty matches
correctly), you can use this Searcher to do it for you:
use ;
let re = new?;
let haystack = "foo1 foo12 foo123";
let mut caps = re.create_captures;
let mut cache = re.create_cache;
let mut matches = vec!;
let mut searcher = new;
while let Some = searcher.advance
assert_eq!;
# Ok::
Implementations
impl<'h> Searcher<'h>
fn new(input: Input<'h>) -> Searcher<'h>Create a new fallible non-overlapping matches iterator.
The given
inputprovides the parameters (including the haystack), while thefinderrepresents a closure that calls the underlying regex engine. The closure may borrow any additional state that is needed, such as a prefilter scanner.fn input<'s>(self: &'s Self) -> &'s Input<'h>Returns the current
Inputused by this searcher.The
Inputreturned is generally equivalent to the one given toSearcher::new, but its start position may be different to reflect the start of the next search to be executed.fn advance_half<F>(self: &mut Self, finder: F) -> Option<HalfMatch> where F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>Return the next half match for an infallible search if one exists, and advance to the next position.
This is like
try_advance_half, except errors are converted into panics.Panics
If the given closure returns an error, then this panics. This is useful when you know your underlying regex engine has been configured to not return an error.
Example
This example shows how to use a
Searcherto iterate over all matches when using a DFA, which only provides "half" matches.use ; let re = DFAnew?; let mut cache = re.create_cache; let input = new; let mut it = new; let expected = Some; let got = it.advance_half; assert_eq!; let expected = Some; let got = it.advance_half; assert_eq!; let expected = Some; let got = it.advance_half; assert_eq!; let expected = None; let got = it.advance_half; assert_eq!; # Ok::This correctly moves iteration forward even when an empty match occurs:
use ; let re = DFAnew?; let mut cache = re.create_cache; let input = new; let mut it = new; let expected = Some; let got = it.advance_half; assert_eq!; let expected = Some; let got = it.advance_half; assert_eq!; let expected = Some; let got = it.advance_half; assert_eq!; let expected = None; let got = it.advance_half; assert_eq!; # Ok::fn advance<F>(self: &mut Self, finder: F) -> Option<Match> where F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>Return the next match for an infallible search if one exists, and advance to the next position.
The search is advanced even in the presence of empty matches by forbidding empty matches from overlapping with any other match.
This is like
try_advance, except errors are converted into panics.Panics
If the given closure returns an error, then this panics. This is useful when you know your underlying regex engine has been configured to not return an error.
Example
This example shows how to use a
Searcherto iterate over all matches when using a regex based on lazy DFAs:use ; let re = new?; let mut cache = re.create_cache; let input = new; let mut it = new; let expected = Some; let got = it.advance; assert_eq!; let expected = Some; let got = it.advance; assert_eq!; let expected = Some; let got = it.advance; assert_eq!; let expected = None; let got = it.advance; assert_eq!; # Ok::This example shows the same as above, but with the PikeVM. This example is useful because it shows how to use this API even when the regex engine doesn't directly return a
Match.use ; let re = new?; let = ; let input = new; let mut it = new; let expected = Some; let got = it.advance; // Note that if we wanted to extract capturing group spans, we could // do that here with 'caps'. assert_eq!; let expected = Some; let got = it.advance; assert_eq!; let expected = Some; let got = it.advance; assert_eq!; let expected = None; let got = it.advance; assert_eq!; # Ok::fn try_advance_half<F>(self: &mut Self, finder: F) -> Result<Option<HalfMatch>, MatchError> where F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>Return the next half match for a fallible search if one exists, and advance to the next position.
This is like
advance_half, except it permits callers to handle errors during iteration.fn try_advance<F>(self: &mut Self, finder: F) -> Result<Option<Match>, MatchError> where F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>Return the next match for a fallible search if one exists, and advance to the next position.
This is like
advance, except it permits callers to handle errors during iteration.fn into_half_matches_iter<F>(self: Self, finder: F) -> TryHalfMatchesIter<'h, F> where F: FnMut(&Input<'_>) -> Result<Option<HalfMatch>, MatchError>Given a closure that executes a single search, return an iterator over all successive non-overlapping half matches.
The iterator returned yields result values. If the underlying regex engine is configured to never return an error, consider calling
TryHalfMatchesIter::infallibleto convert errors into panics.Example
This example shows how to use a
Searcherto create a proper iterator over half matches.use ; let re = DFAnew?; let mut cache = re.create_cache; let input = new; let mut it = new.into_half_matches_iter; let expected = Some; assert_eq!; let expected = Some; assert_eq!; let expected = Some; assert_eq!; let expected = None; assert_eq!; # Ok::fn into_matches_iter<F>(self: Self, finder: F) -> TryMatchesIter<'h, F> where F: FnMut(&Input<'_>) -> Result<Option<Match>, MatchError>Given a closure that executes a single search, return an iterator over all successive non-overlapping matches.
The iterator returned yields result values. If the underlying regex engine is configured to never return an error, consider calling
TryMatchesIter::infallibleto convert errors into panics.Example
This example shows how to use a
Searcherto create a proper iterator over matches.use ; let re = new?; let mut cache = re.create_cache; let input = new; let mut it = new.into_matches_iter; let expected = Some; assert_eq!; let expected = Some; assert_eq!; let expected = Some; assert_eq!; let expected = None; assert_eq!; # Ok::fn into_captures_iter<F>(self: Self, caps: Captures, finder: F) -> TryCapturesIter<'h, F> where F: FnMut(&Input<'_>, &mut Captures) -> Result<(), MatchError>Given a closure that executes a single search, return an iterator over all successive non-overlapping
Capturesvalues.The iterator returned yields result values. If the underlying regex engine is configured to never return an error, consider calling
TryCapturesIter::infallibleto convert errors into panics.Unlike the other iterator constructors, this accepts an initial
Capturesvalue. ThisCapturesvalue is reused for each search, and the iterator implementation clones it before returning it. The caller must provide this value because the iterator is purposely ignorant of the underlying regex engine and thus doesn't know how to create one itself. More to the point, aCapturesvalue itself has a few different constructors, which change which kind of information is available to query in exchange for search performance.Example
This example shows how to use a
Searcherto create a proper iterator overCapturesvalues, which provides access to all capturing group spans for each match.use ; let re = new?; let = ; let haystack = "2010-03-14 2016-10-08 2020-10-22"; let input = new; let mut it = new .into_captures_iter; let got = it.next.expect?; let year = got.get_group_by_name.expect; assert_eq!; let got = it.next.expect?; let month = got.get_group_by_name.expect; assert_eq!; let got = it.next.expect?; let day = got.get_group_by_name.expect; assert_eq!; assert!; # Ok::
impl<'h> Clone for Searcher<'h>
fn clone(self: &Self) -> Searcher<'h>
impl<'h> Debug for Searcher<'h>
fn fmt(self: &Self, f: &mut Formatter<'_>) -> Result
impl<'h> Freeze for Searcher<'h>
impl<'h> RefUnwindSafe for Searcher<'h>
impl<'h> Send for Searcher<'h>
impl<'h> Sync for Searcher<'h>
impl<'h> Unpin for Searcher<'h>
impl<'h> UnsafeUnpin for Searcher<'h>
impl<'h> UnwindSafe for Searcher<'h>
impl<T> Any for Searcher<'h>
fn type_id(self: &Self) -> TypeId
impl<T> Borrow for Searcher<'h>
fn borrow(self: &Self) -> &T
impl<T> BorrowMut for Searcher<'h>
fn borrow_mut(self: &mut Self) -> &mut T
impl<T> CloneToUninit for Searcher<'h>
unsafe fn clone_to_uninit(self: &Self, dest: *mut u8)
impl<T> From for Searcher<'h>
fn from(t: T) -> TReturns the argument unchanged.
impl<T> ToOwned for Searcher<'h>
fn to_owned(self: &Self) -> Tfn clone_into(self: &Self, target: &mut T)
impl<T, U> Into for Searcher<'h>
fn into(self: Self) -> UCalls
U::from(self).That is, this conversion is whatever the implementation of
[From]<T> for Uchooses to do.
impl<T, U> TryFrom for Searcher<'h>
fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>
impl<T, U> TryInto for Searcher<'h>
fn try_into(self: Self) -> Result<U, <U as TryFrom<T>>::Error>