Module `props`

This module defines all available properties.

Properties may be empty marker types and implement BinaryProperty, or enumerations¹ and implement EnumeratedProperty.

BinaryPropertys are queried through a CodePointSetData, while EnumeratedPropertys are queried through CodePointMapData.

In addition, some EnumeratedPropertys also implement ParseableEnumeratedProperty or NamedEnumeratedProperty. For these properties, PropertyParser, PropertyNamesLong, and PropertyNamesShort can be constructed.

either Rust enums, or Rust structs with associated constants (open enums) ↩

Structs

Alnum Characters with the Alphabetic or Decimal_Number property.
Alphabetic Alphabetic characters.
AsciiHexDigit ASCII characters commonly used for the representation of hexadecimal numbers.
BasicEmoji Characters and character sequences intended for general-purpose, independent, direct input.
BidiClass Enumerated property Bidi_Class
BidiControl Format control characters which have specific functions in the Unicode Bidirectional Algorithm.
BidiMirrored Characters that are mirrored in bidirectional text.
BidiMirroringGlyph This is a bitpacked combination of the Bidi_Mirroring_Glyph, Bidi_Mirrored, and Bidi_Paired_Bracket_Type properties.
Blank Horizontal whitespace characters
CanonicalCombiningClass Property Canonical_Combining_Class. See UAX #15: https://www.unicode.org/reports/tr15/.
CaseIgnorable Characters which are ignored for casing purposes.
CaseSensitive Characters that are either the source of a case mapping or in the target of a case mapping.
Cased Uppercase, lowercase, and titlecase characters.
ChangesWhenCasefolded Characters whose normalized forms are not stable under case folding.
ChangesWhenCasemapped Characters which may change when they undergo case mapping.
ChangesWhenLowercased Characters whose normalized forms are not stable under a toLowercase mapping.
ChangesWhenNfkcCasefolded Characters which are not identical to their NFKC_Casefold mapping.
ChangesWhenTitlecased Characters whose normalized forms are not stable under a toTitlecase mapping.
ChangesWhenUppercased Characters whose normalized forms are not stable under a toUppercase mapping.
Dash Punctuation characters explicitly called out as dashes in the Unicode Standard, plus their compatibility equivalents.
DefaultIgnorableCodePoint For programmatic determination of default ignorable code points.
Deprecated Deprecated characters.
Diacritic Characters that linguistically modify the meaning of another character to which they apply.
EastAsianWidth Enumerated property East_Asian_Width.
Emoji Characters that are emoji.
EmojiComponent Characters used in emoji sequences that normally do not appear on emoji keyboards as separate choices, such as base characters for emoji keycaps.
EmojiModifier Characters that are emoji modifiers.
EmojiModifierBase Characters that can serve as a base for emoji modifiers.
EmojiPresentation Characters that have emoji presentation by default.
ExtendedPictographic Pictographic symbols, as well as reserved ranges in blocks largely associated with emoji characters
Extender Characters whose principal function is to extend the value of a preceding alphabetic character or to extend the shape of adjacent characters.
FullCompositionExclusion Characters that are excluded from composition.
GeneralCategoryGroup Groupings of multiple General_Category property values.
GeneralCategoryOutOfBoundsError Error value for impl TryFrom<u8> for GeneralCategory.
Graph Invisible characters.
GraphemeBase Property used together with the definition of Standard Korean Syllable Block to define "Grapheme base".
GraphemeClusterBreak Enumerated property Grapheme_Cluster_Break.
GraphemeExtend Property used to define "Grapheme extender".
GraphemeLink Deprecated property.
HangulSyllableType Enumerated property Hangul_Syllable_Type
HexDigit Characters commonly used for the representation of hexadecimal numbers, plus their compatibility equivalents.
Hyphen Deprecated property.
IdContinue Characters that can come after the first character in an identifier.
IdStart Characters that can begin an identifier.
Ideographic Characters considered to be CJKV (Chinese, Japanese, Korean, and Vietnamese) ideographs, or related siniform ideographs
IdsBinaryOperator Characters used in Ideographic Description Sequences.
IdsTrinaryOperator Characters used in Ideographic Description Sequences.
IndicSyllabicCategory Property Indic_Syllabic_Category. See UAX #44: https://www.unicode.org/reports/tr44/#Indic_Syllabic_Category.
JoinControl Format control characters which have specific functions for control of cursive joining and ligation.
JoiningType Enumerated property Joining_Type.
LineBreak Enumerated property Line_Break.
LogicalOrderException A small number of spacing vowel letters occurring in certain Southeast Asian scripts such as Thai and Lao.
Lowercase Lowercase characters.
Math Characters used in mathematical notation.
NfcInert Characters that are inert under NFC, i.e., they do not interact with adjacent characters.
NfdInert Characters that are inert under NFD, i.e., they do not interact with adjacent characters.
NfkcInert Characters that are inert under NFKC, i.e., they do not interact with adjacent characters.
NfkdInert Characters that are inert under NFKD, i.e., they do not interact with adjacent characters.
NoncharacterCodePoint Code points permanently reserved for internal use.
PatternSyntax Characters used as syntax in patterns (such as regular expressions).
PatternWhiteSpace Characters used as whitespace in patterns (such as regular expressions).
PrependedConcatenationMark A small class of visible format controls, which precede and then span a sequence of other characters, usually digits.
Print Printable characters (visible characters and whitespace).
QuotationMark Punctuation characters that function as quotation marks.
Radical Characters used in the definition of Ideographic Description Sequences.
RegionalIndicator Regional indicator characters, U+1F1E6..U+1F1FF.
Script Enumerated property Script.
SegmentStarter Characters that are starters in terms of Unicode normalization and combining character sequences.
SentenceBreak Enumerated property Sentence_Break.
SentenceTerminal Punctuation characters that generally mark the end of sentences.
SoftDotted Characters with a "soft dot", like i or j.
TerminalPunctuation Punctuation characters that generally mark the end of textual units.
UnifiedIdeograph A property which specifies the exact set of Unified CJK Ideographs in the standard.
Uppercase Uppercase characters.
VariationSelector Characters that are Variation Selectors.
VerticalOrientation Property Vertical_Orientation
WhiteSpace Spaces, separator characters and other control characters which should be treated by programming languages as "white space" for the purpose of parsing elements.
WordBreak Enumerated property Word_Break.
Xdigit Hexadecimal digits
XidContinue Characters that can come after the first character in an identifier.
XidStart Characters that can begin an identifier.

Enums

BidiPairedBracketType The enum represents Bidi_Paired_Bracket_Type.
GeneralCategory Enumerated property General_Category.

Traits

BinaryProperty A binary Unicode character property.
EmojiSet An Emoji set as defined by Unicode Technical Standard #51.
EnumeratedProperty A Unicode character property that assigns a value to each code point.
NamedEnumeratedProperty A property whose value names can be represented as strings.
ParseableEnumeratedProperty A property whose value names can be parsed from strings.