Struct GroupInfo
struct GroupInfo(_)
Represents information about capturing groups in a compiled regex.
The information encapsulated by this type consists of the following. For each pattern:
- A map from every capture group name to its corresponding capture group index.
- A map from every capture group index to its corresponding capture group name.
- A map from capture group index to its corresponding slot index. A slot refers to one half of a capturing group. That is, a capture slot is either the start or end of a capturing group. A slot is usually the mechanism by which a regex engine records offsets for each capturing group during a search.
A GroupInfo uses reference counting internally and is thus cheap to
clone.
Mapping from capture groups to slots
One of the main responsibilities of a GroupInfo is to build a mapping
from (PatternID, u32) (where the u32 is a capture index) to something
called a "slot." As mentioned above, a slot refers to one half of a
capturing group. Both combined provide the start and end offsets of
a capturing group that participated in a match.
The mapping between group indices and slots is an API guarantee. That is, the mapping won't change within a semver compatible release.
Slots exist primarily because this is a convenient mechanism by which
regex engines report group offsets at search time. For example, the
nfa::thompson::State::Capture
NFA state includes the slot index. When a regex engine transitions through
this state, it will likely use the slot index to write the current haystack
offset to some region of memory. When a match is found, those slots are
then reported to the caller, typically via a convenient abstraction like a
Captures value.
Because this crate provides first class support for multi-pattern regexes,
and because of some performance related reasons, the mapping between
capturing groups and slots is a little complex. However, in the case of a
single pattern, the mapping can be described very simply: for all capture
group indices i, its corresponding slots are at i * 2 and i * 2 + 1.
Notice that the pattern ID isn't involved at all here, because it only
applies to a single-pattern regex, it is therefore always 0.
In the multi-pattern case, the mapping is a bit more complicated. To talk about it, we must define what we mean by "implicit" vs "explicit" capturing groups:
- An implicit capturing group refers to the capturing group that is
present for every pattern automatically, and corresponds to the overall
match of a pattern. Every pattern has precisely one implicit capturing
group. It is always unnamed and it always corresponds to the capture group
index
0. - An explicit capturing group refers to any capturing group that
appears in the concrete syntax of the pattern. (Or, if an NFA was hand
built without any concrete syntax, it refers to any capturing group with an
index greater than
0.)
Some examples:
\w+has one implicit capturing group and zero explicit capturing groups.(\w+)has one implicit group and one explicit group.foo(\d+)(?:\pL+)(\d+)has one implicit group and two explicit groups.
Turning back to the slot mapping, we can now state it as follows:
- Given a pattern ID
pid, the slots for its implicit group are always atpid * 2andpid * 2 + 1. - Given a pattern ID
0, the slots for its explicit groups start atgroup_info.pattern_len() * 2. - Given a pattern ID
pid > 0, the slots for its explicit groups start immediately following where the slots for the explicit groups ofpid - 1end.
In particular, while there is a concrete formula one can use to determine where the slots for the implicit group of any pattern are, there is no general formula for determining where the slots for explicit capturing groups are. This is because each pattern can contain a different number of groups.
The intended way of getting the slots for a particular capturing group
(whether implicit or explicit) is via the GroupInfo::slot or
GroupInfo::slots method.
See below for a concrete example of how capturing groups get mapped to slots.
Example
This example shows how to build a new GroupInfo and query it for
information.
use ;
let info = new?;
// The number of patterns being tracked.
assert_eq!;
// We can query the number of groups for any pattern.
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
// An invalid pattern always has zero groups.
assert_eq!;
// 2 slots per group
assert_eq!;
// We can map a group index for a particular pattern to its name, if
// one exists.
assert_eq!;
assert_eq!;
// Or map a name to its group index.
assert_eq!;
assert_eq!;
# Ok::
Example: mapping from capture groups to slots
This example shows the specific mapping from capture group indices for each pattern to their corresponding slots. The slot values shown in this example are considered an API guarantee.
use ;
let info = new?;
// We first show the slots for each pattern's implicit group.
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
// And now we show the slots for each pattern's explicit group.
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
// Asking for the slots for an invalid pattern ID or even for an invalid
// group index for a specific pattern will return None. So for example,
// you're guaranteed to not get the slots for a different pattern than the
// one requested.
assert_eq!;
assert_eq!;
# Ok::
Implementations
impl GroupInfo
fn new<P, G, N>(pattern_groups: P) -> Result<GroupInfo, GroupInfoError> where P: IntoIterator<Item = G>, G: IntoIterator<Item = Option<N>>, N: AsRef<str>Creates a new group info from a sequence of patterns, where each sequence of patterns yields a sequence of possible group names. The index of each pattern in the sequence corresponds to its
PatternID, and the index of each group in each pattern's sequence corresponds to its corresponding group index.While this constructor is very generic and therefore perhaps hard to chew on, an example of a valid concrete type that can be passed to this constructor is
Vec<Vec<Option<String>>>. The outerVeccorresponds to the patterns, i.e., oneVec<Option<String>>per pattern. The innerVeccorresponds to the capturing groups for each pattern. TheOption<String>corresponds to the name of the capturing group, if present.It is legal to pass an empty iterator to this constructor. It will return an empty group info with zero slots. An empty group info is useful for cases where you have no patterns or for cases where slots aren't being used at all (e.g., for most DFAs in this crate).
Errors
This constructor returns an error if the given capturing groups are invalid in some way. Those reasons include, but are not necessarily limited to:
- Too many patterns (i.e.,
PatternIDwould overflow). - Too many capturing groups (e.g.,
u32would overflow). - A pattern is given that has no capturing groups. (All patterns must
have at least an implicit capturing group at index
0.) - The capturing group at index
0has a name. It must be unnamed. - There are duplicate capturing group names within the same pattern. (Multiple capturing groups with the same name may exist, but they must be in different patterns.)
An example below shows how to trigger some of the above error conditions.
Example
This example shows how to build a new
GroupInfoand query it for information.use GroupInfo; let info = new?; // The number of patterns being tracked. assert_eq!; // 2 slots per group assert_eq!; # Ok::Example: empty
GroupInfoThis example shows how to build a new
GroupInfoand query it for information.use GroupInfo; let info = empty; // Everything is zero. assert_eq!; assert_eq!; # Ok::Example: error conditions
This example shows how to provoke some of the ways in which building a
GroupInfocan fail.use GroupInfo; // Either the group info is empty, or all patterns must have at least // one capturing group. assert!; // Note that building an empty group info is OK. assert!; // The first group in each pattern must correspond to an implicit // anonymous group. i.e., One that is not named. By convention, this // group corresponds to the overall match of a regex. Every other group // in a pattern is explicit and optional. assert!; // There must not be duplicate group names within the same pattern. assert!; // But duplicate names across distinct patterns is OK. assert!; # Ok::There are other ways for building a
GroupInfoto fail but are difficult to show. For example, if the number of patterns given would overflowPatternID.- Too many patterns (i.e.,
fn empty() -> GroupInfoThis creates an empty
GroupInfo.This is a convenience routine for calling
GroupInfo::newwith an iterator that yields no elements.Example
This example shows how to build a new empty
GroupInfoand query it for information.use GroupInfo; let info = empty; // Everything is zero. assert_eq!; assert_eq!; assert_eq!; # Ok::fn to_index(self: &Self, pid: PatternID, name: &str) -> Option<usize>Return the capture group index corresponding to the given name in the given pattern. If no such capture group name exists in the given pattern, then this returns
None.If the given pattern ID is invalid, then this returns
None.This also returns
Nonefor all inputs if these captures are empty (e.g., built from an emptyGroupInfo). To check whether captures are present for a specific pattern, useGroupInfo::group_len.Example
This example shows how to find the capture index for the given pattern and group name.
Remember that capture indices are relative to the pattern, such that the same capture index value may refer to different capturing groups for distinct patterns.
# if cfg! // miri takes too long use ; let = ; let nfa = NFAnew_many?; let groups = nfa.group_info; assert_eq!; // Recall that capture index 0 is always unnamed and refers to the // entire pattern. So the first capturing group present in the pattern // itself always starts at index 1. assert_eq!; // And if a name does not exist for a particular pattern, None is // returned. assert!; assert!; # Ok::fn to_name(self: &Self, pid: PatternID, group_index: usize) -> Option<&str>Return the capture name for the given index and given pattern. If the corresponding group does not have a name, then this returns
None.If the pattern ID is invalid, then this returns
None.If the group index is invalid for the given pattern, then this returns
None. A groupindexis valid for a patternpidin annfaif and only ifindex < nfa.pattern_capture_len(pid).This also returns
Nonefor all inputs if these captures are empty (e.g., built from an emptyGroupInfo). To check whether captures are present for a specific pattern, useGroupInfo::group_len.Example
This example shows how to find the capture group name for the given pattern and group index.
# if cfg! // miri takes too long use ; let = ; let nfa = NFAnew_many?; let groups = nfa.group_info; assert_eq!; assert_eq!; assert_eq!; assert_eq!; assert_eq!; assert_eq!; assert_eq!; // '3' is not a valid capture index for the second pattern. assert_eq!; # Ok::fn pattern_names(self: &Self, pid: PatternID) -> GroupInfoPatternNames<'_>Return an iterator of all capture groups and their names (if present) for a particular pattern.
If the given pattern ID is invalid or if this
GroupInfois empty, then the iterator yields no elements.The number of elements yielded by this iterator is always equal to the result of calling
GroupInfo::group_lenwith the samePatternID.Example
This example shows how to get a list of all capture group names for a particular pattern.
use ; let nfa = NFAnew?; // The first is the implicit group that is always unnammed. The next // 5 groups are the explicit groups found in the concrete syntax above. let expected = vec!; let got: = nfa.group_info.pattern_names.collect; assert_eq!; // Using an invalid pattern ID will result in nothing yielded. let got = nfa.group_info.pattern_names.count; assert_eq!; # Ok::fn all_names(self: &Self) -> GroupInfoAllNames<'_>Return an iterator of all capture groups for all patterns supported by this
GroupInfo. Each item yielded is a triple of the group's pattern ID, index in the pattern and the group's name, if present.Example
This example shows how to get a list of all capture groups found in one NFA, potentially spanning multiple patterns.
use ; let nfa = NFAnew_many?; let expected = vec!; let got: = nfa.group_info.all_names.collect; assert_eq!; # Ok::Unlike other capturing group related routines, this routine doesn't panic even if captures aren't enabled on this NFA:
use ; let nfa = NFAcompiler .configure .build_many?; // When captures aren't enabled, there's nothing to return. assert_eq!; # Ok::fn slots(self: &Self, pid: PatternID, group_index: usize) -> Option<(usize, usize)>Returns the starting and ending slot corresponding to the given capturing group for the given pattern. The ending slot is always one more than the starting slot returned.
Note that this is like
GroupInfo::slot, except that it also returns the ending slot value for convenience.If either the pattern ID or the capture index is invalid, then this returns None.
Example
This example shows that the starting slots for the first capturing group of each pattern are distinct.
use ; let nfa = NFAnew_many?; assert_ne!; // Also, the start and end slot values are never equivalent. let = nfa.group_info.slots.unwrap; assert_ne!; # Ok::fn slot(self: &Self, pid: PatternID, group_index: usize) -> Option<usize>Returns the starting slot corresponding to the given capturing group for the given pattern. The ending slot is always one more than the value returned.
If either the pattern ID or the capture index is invalid, then this returns None.
Example
This example shows that the starting slots for the first capturing group of each pattern are distinct.
use ; let nfa = NFAnew_many?; assert_ne!; # Ok::fn pattern_len(self: &Self) -> usizeReturns the total number of patterns in this
GroupInfo.This may return zero if the
GroupInfowas constructed with no patterns.This is guaranteed to be no bigger than
PatternID::LIMITbecauseGroupInfoconstruction will fail if too many patterns are added.Example
use NFA; let nfa = NFAnew_many?; assert_eq!; let nfa = NFAnever_match; assert_eq!; let nfa = NFAalways_match; assert_eq!; # Ok::fn group_len(self: &Self, pid: PatternID) -> usizeReturn the number of capture groups in a pattern.
If the pattern ID is invalid, then this returns
0.Example
This example shows how the values returned by this routine may vary for different patterns and NFA configurations.
use ; let nfa = NFAnew?; // There are 3 explicit groups in the pattern's concrete syntax and // 1 unnamed and implicit group spanning the entire pattern. assert_eq!; let nfa = NFAnew?; // There is just the unnamed implicit group. assert_eq!; let nfa = NFAcompiler .configure .build?; // We disabled capturing groups, so there are none. assert_eq!; let nfa = NFAcompiler .configure .build?; // We disabled capturing groups, so there are none, even if there are // explicit groups in the concrete syntax. assert_eq!; # Ok::fn all_group_len(self: &Self) -> usizeReturn the total number of capture groups across all patterns.
This includes implicit groups that represent the entire match of a pattern.
Example
This example shows how the values returned by this routine may vary for different patterns and NFA configurations.
use ; let nfa = NFAnew?; // There are 3 explicit groups in the pattern's concrete syntax and // 1 unnamed and implicit group spanning the entire pattern. assert_eq!; let nfa = NFAnew?; // There is just the unnamed implicit group. assert_eq!; let nfa = NFAnew_many?; // Each pattern has one implicit groups, and two // patterns have one explicit group each. assert_eq!; let nfa = NFAcompiler .configure .build?; // We disabled capturing groups, so there are none. assert_eq!; let nfa = NFAcompiler .configure .build?; // We disabled capturing groups, so there are none, even if there are // explicit groups in the concrete syntax. assert_eq!; # Ok::fn slot_len(self: &Self) -> usizeReturns the total number of slots in this
GroupInfoacross all patterns.The total number of slots is always twice the total number of capturing groups, including both implicit and explicit groups.
Example
This example shows the relationship between the number of capturing groups and slots.
use GroupInfo; // There are 11 total groups here. let info = new?; // 2 slots per group gives us 11*2=22 slots. assert_eq!; # Ok::fn implicit_slot_len(self: &Self) -> usizeReturns the total number of slots for implicit capturing groups.
This is like
GroupInfo::slot_len, except it doesn't include the explicit slots for each pattern. Since there are always exactly 2 implicit slots for each pattern, the number of implicit slots is always equal to twice the number of patterns.Example
This example shows the relationship between the number of capturing groups, implicit slots and explicit slots.
use GroupInfo; // There are 11 total groups here. let info = new?; // 2 slots per group gives us 11*2=22 slots. assert_eq!; // 2 implicit slots per pattern gives us 2 implicit slots since there // is 1 pattern. assert_eq!; // 2 explicit capturing groups gives us 2*2=4 explicit slots. assert_eq!; # Ok::fn explicit_slot_len(self: &Self) -> usizeReturns the total number of slots for explicit capturing groups.
This is like
GroupInfo::slot_len, except it doesn't include the implicit slots for each pattern. (There are always 2 implicit slots for each pattern.)For a non-empty
GroupInfo, it is always the case thatslot_lenis strictly greater thanexplicit_slot_len. For an emptyGroupInfo, both the total number of slots and the number of explicit slots is0.Example
This example shows the relationship between the number of capturing groups, implicit slots and explicit slots.
use GroupInfo; // There are 11 total groups here. let info = new?; // 2 slots per group gives us 11*2=22 slots. assert_eq!; // 2 implicit slots per pattern gives us 2 implicit slots since there // is 1 pattern. assert_eq!; // 2 explicit capturing groups gives us 2*2=4 explicit slots. assert_eq!; # Ok::fn memory_usage(self: &Self) -> usizeReturns the memory usage, in bytes, of this
GroupInfo.This does not include the stack size used up by this
GroupInfo. To compute that, usestd::mem::size_of::<GroupInfo>().
impl Clone for GroupInfo
fn clone(self: &Self) -> GroupInfo
impl Debug for GroupInfo
fn fmt(self: &Self, f: &mut Formatter<'_>) -> Result
impl Default for GroupInfo
fn default() -> GroupInfo
impl Freeze for GroupInfo
impl RefUnwindSafe for GroupInfo
impl Send for GroupInfo
impl Sync for GroupInfo
impl Unpin for GroupInfo
impl UnsafeUnpin for GroupInfo
impl UnwindSafe for GroupInfo
impl<T> Any for GroupInfo
fn type_id(self: &Self) -> TypeId
impl<T> Borrow for GroupInfo
fn borrow(self: &Self) -> &T
impl<T> BorrowMut for GroupInfo
fn borrow_mut(self: &mut Self) -> &mut T
impl<T> CloneToUninit for GroupInfo
unsafe fn clone_to_uninit(self: &Self, dest: *mut u8)
impl<T> From for GroupInfo
fn from(t: T) -> TReturns the argument unchanged.
impl<T> ToOwned for GroupInfo
fn to_owned(self: &Self) -> Tfn clone_into(self: &Self, target: &mut T)
impl<T, U> Into for GroupInfo
fn into(self: Self) -> UCalls
U::from(self).That is, this conversion is whatever the implementation of
[From]<T> for Uchooses to do.
impl<T, U> TryFrom for GroupInfo
fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>
impl<T, U> TryInto for GroupInfo
fn try_into(self: Self) -> Result<U, <U as TryFrom<T>>::Error>