Trait UnicodeSegmentation
trait UnicodeSegmentation
Methods for segmenting strings according to Unicode Standard Annex #29.
Required Methods
fn graphemes(self: &Self, is_extended: bool) -> Graphemes<'_>Returns an iterator over the grapheme clusters of
self.If
is_extendedis true, the iterator is over the extended grapheme clusters; otherwise, the iterator is over the legacy grapheme clusters. UAX#29 recommends extended grapheme cluster boundaries for general processing.Examples
# use UnicodeSegmentation; let gr1 = graphemes .; let b: & = &; assert_eq!; let gr2 = graphemes.; let b: & = &; assert_eq!;fn grapheme_indices(self: &Self, is_extended: bool) -> GraphemeIndices<'_>Returns an iterator over the grapheme clusters of
selfand their byte offsets. Seegraphemes()for more information.Examples
# use UnicodeSegmentation; let gr_inds = grapheme_indices .; let b: & = &; assert_eq!;fn unicode_words(self: &Self) -> UnicodeWords<'_>Returns an iterator over the words of
self, separated on UAX#29 word boundaries.Here, "words" are just those substrings which, after splitting on UAX#29 word boundaries, contain any alphanumeric characters. That is, the substring must contain at least one character with the Alphabetic property, or with General_Category=Number.
Example
# use UnicodeSegmentation; let uws = "The quick (\"brown\") fox can't jump 32.3 feet, right?"; let uw1 = uws.unicode_words.; let b: & = &; assert_eq!;fn unicode_word_indices(self: &Self) -> UnicodeWordIndices<'_>Returns an iterator over the words of
self, separated on UAX#29 word boundaries, and their offsets.Here, "words" are just those substrings which, after splitting on UAX#29 word boundaries, contain any alphanumeric characters. That is, the substring must contain at least one character with the Alphabetic property, or with General_Category=Number.
Example
# use UnicodeSegmentation; let uwis = "The quick (\"brown\") fox can't jump 32.3 feet, right?"; let uwi1 = uwis.unicode_word_indices.; let b: & = &; assert_eq!;fn split_word_bounds(self: &Self) -> UWordBounds<'_>Returns an iterator over substrings of
selfseparated on UAX#29 word boundaries.The concatenation of the substrings returned by this function is just the original string.
Example
# use UnicodeSegmentation; let swu1 = "The quick (\"brown\") fox".split_word_bounds.; let b: & = &; assert_eq!;fn split_word_bound_indices(self: &Self) -> UWordBoundIndices<'_>Returns an iterator over substrings of
self, split on UAX#29 word boundaries, and their offsets. Seesplit_word_bounds()for more information.Example
# use UnicodeSegmentation; let swi1 = "Brr, it's 29.3°F!".split_word_bound_indices.; let b: & = &; assert_eq!;fn unicode_sentences(self: &Self) -> UnicodeSentences<'_>Returns an iterator over substrings of
selfseparated on UAX#29 sentence boundaries.Here, "sentences" are just those substrings which, after splitting on UAX#29 sentence boundaries, contain any alphanumeric characters. That is, the substring must contain at least one character with the Alphabetic property, or with General_Category=Number.
Example
# use UnicodeSegmentation; let uss = "Mr. Fox jumped. [...] The dog was too lazy."; let us1 = uss.unicode_sentences.; let b: & = &; assert_eq!;fn split_sentence_bounds(self: &Self) -> USentenceBounds<'_>Returns an iterator over substrings of
selfseparated on UAX#29 sentence boundaries.The concatenation of the substrings returned by this function is just the original string.
Example
# use UnicodeSegmentation; let ssbs = "Mr. Fox jumped. [...] The dog was too lazy."; let ssb1 = ssbs.split_sentence_bounds.; let b: & = &; assert_eq!;fn split_sentence_bound_indices(self: &Self) -> USentenceBoundIndices<'_>Returns an iterator over substrings of
self, split on UAX#29 sentence boundaries, and their offsets. Seesplit_sentence_bounds()for more information.Example
# use UnicodeSegmentation; let ssis = "Mr. Fox jumped. [...] The dog was too lazy."; let ssi1 = ssis.split_sentence_bound_indices.; let b: & = &; assert_eq!;
Implementors
impl UnicodeSegmentation for str