Struct Builder

struct Builder { ... }

A builder for configuring and constructing a Regex.

The builder permits configuring two different aspects of a Regex:

Once configured, the builder can then be used to construct a Regex from one of 4 different inputs:

The latter two methods in particular provide a way to construct a fully feature regular expression matcher directly from an Hir expression without having to first convert it to a string. (This is in contrast to the top-level regex crate which intentionally provides no such API in order to avoid making regex-syntax a public dependency.)

As a convenience, this builder may be created via Regex::builder, which may help avoid an extra import.

Example: change the line terminator

This example shows how to enable multi-line mode by default and change the line terminator to the NUL byte:

use regex_automata::{meta::Regex, util::syntax, Match};

let re = Regex::builder()
    .syntax(syntax::Config::new().multi_line(true))
    .configure(Regex::config().line_terminator(b'\x00'))
    .build(r"^foo$")?;
let hay = "\x00foo\x00";
assert_eq!(Some(Match::must(0, 1..4)), re.find(hay));

# Ok::<(), Box<dyn std::error::Error>>(())

Example: disable UTF-8 requirement

By default, regex patterns are required to match UTF-8. This includes regex patterns that can produce matches of length zero. In the case of an empty match, by default, matches will not appear between the code units of a UTF-8 encoded codepoint.

However, it can be useful to disable this requirement, particularly if you're searching things like &[u8] that are not known to be valid UTF-8.

use regex_automata::{meta::Regex, util::syntax, Match};

let mut builder = Regex::builder();
// Disables the requirement that non-empty matches match UTF-8.
builder.syntax(syntax::Config::new().utf8(false));
// Disables the requirement that empty matches match UTF-8 boundaries.
builder.configure(Regex::config().utf8_empty(false));

// We can match raw bytes via \xZZ syntax, but we need to disable
// Unicode mode to do that. We could disable it everywhere, or just
// selectively, as shown here.
let re = builder.build(r"(?-u:\xFF)foo(?-u:\xFF)")?;
let hay = b"\xFFfoo\xFF";
assert_eq!(Some(Match::must(0, 0..5)), re.find(hay));

// We can also match between code units.
let re = builder.build(r"")?;
let hay = "";
assert_eq!(re.find_iter(hay).collect::<Vec<Match>>(), vec![
    Match::must(0, 0..0),
    Match::must(0, 1..1),
    Match::must(0, 2..2),
    Match::must(0, 3..3),
]);

# Ok::<(), Box<dyn std::error::Error>>(())

Implementations

impl Builder

fn new() -> Builder

Creates a new builder for configuring and constructing a Regex.

fn build(self: &Self, pattern: &str) -> Result<Regex, BuildError>

Builds a Regex from a single pattern string.

If there was a problem parsing the pattern or a problem turning it into a regex matcher, then an error is returned.

Example

This example shows how to configure syntax options.

use regex_automata::{meta::Regex, util::syntax, Match};

let re = Regex::builder()
    .syntax(syntax::Config::new().crlf(true).multi_line(true))
    .build(r"^foo$")?;
let hay = "\r\nfoo\r\n";
assert_eq!(Some(Match::must(0, 2..5)), re.find(hay));

# Ok::<(), Box<dyn std::error::Error>>(())
fn build_many<P: AsRef<str>>(self: &Self, patterns: &[P]) -> Result<Regex, BuildError>

Builds a Regex from many pattern strings.

If there was a problem parsing any of the patterns or a problem turning them into a regex matcher, then an error is returned.

Example: finding the pattern that caused an error

When a syntax error occurs, it is possible to ask which pattern caused the syntax error.

use regex_automata::{meta::Regex, PatternID};

let err = Regex::builder()
    .build_many(&["a", "b", r"\p{Foo}", "c"])
    .unwrap_err();
assert_eq!(Some(PatternID::must(2)), err.pattern());

Example: zero patterns is valid

Building a regex with zero patterns results in a regex that never matches anything. Because this routine is generic, passing an empty slice usually requires a turbo-fish (or something else to help type inference).

use regex_automata::{meta::Regex, util::syntax, Match};

let re = Regex::builder()
    .build_many::<&str>(&[])?;
assert_eq!(None, re.find(""));

# Ok::<(), Box<dyn std::error::Error>>(())
fn build_from_hir(self: &Self, hir: &Hir) -> Result<Regex, BuildError>

Builds a Regex directly from an Hir expression.

This is useful if you needed to parse a pattern string into an Hir for other reasons (such as analysis or transformations). This routine permits building a Regex directly from the Hir expression instead of first converting the Hir back to a pattern string.

When using this method, any options set via Builder::syntax are ignored. Namely, the syntax options only apply when parsing a pattern string, which isn't relevant here.

If there was a problem building the underlying regex matcher for the given Hir, then an error is returned.

Example

This example shows how one can hand-construct an Hir expression and build a regex from it without doing any parsing at all.

use {
    regex_automata::{meta::Regex, Match},
    regex_syntax::hir::{Hir, Look},
};

// (?Rm)^foo$
let hir = Hir::concat(vec![
    Hir::look(Look::StartCRLF),
    Hir::literal("foo".as_bytes()),
    Hir::look(Look::EndCRLF),
]);
let re = Regex::builder()
    .build_from_hir(&hir)?;
let hay = "\r\nfoo\r\n";
assert_eq!(Some(Match::must(0, 2..5)), re.find(hay));

Ok::<(), Box<dyn std::error::Error>>(())
fn build_many_from_hir<H: Borrow<Hir>>(self: &Self, hirs: &[H]) -> Result<Regex, BuildError>

Builds a Regex directly from many Hir expressions.

This is useful if you needed to parse pattern strings into Hir expressions for other reasons (such as analysis or transformations). This routine permits building a Regex directly from the Hir expressions instead of first converting the Hir expressions back to pattern strings.

When using this method, any options set via Builder::syntax are ignored. Namely, the syntax options only apply when parsing a pattern string, which isn't relevant here.

If there was a problem building the underlying regex matcher for the given Hir expressions, then an error is returned.

Note that unlike Builder::build_many, this can only fail as a result of building the underlying matcher. In that case, there is no single Hir expression that can be isolated as a reason for the failure. So if this routine fails, it's not possible to determine which Hir expression caused the failure.

Example

This example shows how one can hand-construct multiple Hir expressions and build a single regex from them without doing any parsing at all.

use {
    regex_automata::{meta::Regex, Match},
    regex_syntax::hir::{Hir, Look},
};

// (?Rm)^foo$
let hir1 = Hir::concat(vec![
    Hir::look(Look::StartCRLF),
    Hir::literal("foo".as_bytes()),
    Hir::look(Look::EndCRLF),
]);
// (?Rm)^bar$
let hir2 = Hir::concat(vec![
    Hir::look(Look::StartCRLF),
    Hir::literal("bar".as_bytes()),
    Hir::look(Look::EndCRLF),
]);
let re = Regex::builder()
    .build_many_from_hir(&[&hir1, &hir2])?;
let hay = "\r\nfoo\r\nbar";
let got: Vec<Match> = re.find_iter(hay).collect();
let expected = vec![
    Match::must(0, 2..5),
    Match::must(1, 7..10),
];
assert_eq!(expected, got);

Ok::<(), Box<dyn std::error::Error>>(())
fn configure(self: &mut Self, config: Config) -> &mut Builder

Configure the behavior of a Regex.

This configuration controls non-syntax options related to the behavior of a Regex. This includes things like whether empty matches can split a codepoint, prefilters, line terminators and a long list of options for configuring which regex engines the meta regex engine will be able to use internally.

Example

This example shows how to disable UTF-8 empty mode. This will permit empty matches to occur between the UTF-8 encoding of a codepoint.

use regex_automata::{meta::Regex, Match};

let re = Regex::new("")?;
let got: Vec<Match> = re.find_iter("").collect();
// Matches only occur at the beginning and end of the snowman.
assert_eq!(got, vec![
    Match::must(0, 0..0),
    Match::must(0, 3..3),
]);

let re = Regex::builder()
    .configure(Regex::config().utf8_empty(false))
    .build("")?;
let got: Vec<Match> = re.find_iter("").collect();
// Matches now occur at every position!
assert_eq!(got, vec![
    Match::must(0, 0..0),
    Match::must(0, 1..1),
    Match::must(0, 2..2),
    Match::must(0, 3..3),
]);

Ok::<(), Box<dyn std::error::Error>>(())
fn syntax(self: &mut Self, config: Config) -> &mut Builder

Configure the syntax options when parsing a pattern string while building a Regex.

These options only apply when Builder::build or Builder::build_many are used. The other build methods accept Hir values, which have already been parsed.

Example

This example shows how to enable case insensitive mode.

use regex_automata::{meta::Regex, util::syntax, Match};

let re = Regex::builder()
    .syntax(syntax::Config::new().case_insensitive(true))
    .build(r")?;
assert_eq!(Some(Match::must(0, 0..2)), re.find(r"));

Ok::<(), Box<dyn std::error::Error>>(())

impl Clone for Builder

fn clone(self: &Self) -> Builder

impl Debug for Builder

fn fmt(self: &Self, f: &mut Formatter<'_>) -> Result

impl Freeze for Builder

impl RefUnwindSafe for Builder

impl Send for Builder

impl Sync for Builder

impl Unpin for Builder

impl UnsafeUnpin for Builder

impl UnwindSafe for Builder

impl<T> Any for Builder

fn type_id(self: &Self) -> TypeId

impl<T> Borrow for Builder

fn borrow(self: &Self) -> &T

impl<T> BorrowMut for Builder

fn borrow_mut(self: &mut Self) -> &mut T

impl<T> CloneToUninit for Builder

unsafe fn clone_to_uninit(self: &Self, dest: *mut u8)

impl<T> From for Builder

fn from(t: T) -> T

Returns the argument unchanged.

impl<T> ToOwned for Builder

fn to_owned(self: &Self) -> T
fn clone_into(self: &Self, target: &mut T)

impl<T, U> Into for Builder

fn into(self: Self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of [From]<T> for U chooses to do.

impl<T, U> TryFrom for Builder

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto for Builder

fn try_into(self: Self) -> Result<U, <U as TryFrom<T>>::Error>