Module props
This module defines all available properties.
Properties may be empty marker types and implement BinaryProperty, or enumerations1
and implement EnumeratedProperty.
BinaryPropertys are queried through a CodePointSetData,
while EnumeratedPropertys are queried through CodePointMapData.
In addition, some EnumeratedPropertys also implement ParseableEnumeratedProperty or
NamedEnumeratedProperty. For these properties, PropertyParser,
PropertyNamesLong, and PropertyNamesShort
can be constructed.
-
either Rust
enums, or Ruststructs with associated constants (open enums) ↩
Structs
-
Alnum
Characters with the
AlphabeticorDecimal_Numberproperty. - Alphabetic Alphabetic characters.
- AsciiHexDigit ASCII characters commonly used for the representation of hexadecimal numbers.
- BasicEmoji Characters and character sequences intended for general-purpose, independent, direct input.
- BidiClass Enumerated property Bidi_Class
- BidiControl Format control characters which have specific functions in the Unicode Bidirectional Algorithm.
- BidiMirrored Characters that are mirrored in bidirectional text.
-
BidiMirroringGlyph
This is a bitpacked combination of the
Bidi_Mirroring_Glyph,Bidi_Mirrored, andBidi_Paired_Bracket_Typeproperties. - Blank Horizontal whitespace characters
- CanonicalCombiningClass Property Canonical_Combining_Class. See UAX #15: https://www.unicode.org/reports/tr15/.
- CaseIgnorable Characters which are ignored for casing purposes.
- CaseSensitive Characters that are either the source of a case mapping or in the target of a case mapping.
- Cased Uppercase, lowercase, and titlecase characters.
- ChangesWhenCasefolded Characters whose normalized forms are not stable under case folding.
- ChangesWhenCasemapped Characters which may change when they undergo case mapping.
-
ChangesWhenLowercased
Characters whose normalized forms are not stable under a
toLowercasemapping. -
ChangesWhenNfkcCasefolded
Characters which are not identical to their
NFKC_Casefoldmapping. -
ChangesWhenTitlecased
Characters whose normalized forms are not stable under a
toTitlecasemapping. -
ChangesWhenUppercased
Characters whose normalized forms are not stable under a
toUppercasemapping. - Dash Punctuation characters explicitly called out as dashes in the Unicode Standard, plus their compatibility equivalents.
- DefaultIgnorableCodePoint For programmatic determination of default ignorable code points.
- Deprecated Deprecated characters.
- Diacritic Characters that linguistically modify the meaning of another character to which they apply.
- EastAsianWidth Enumerated property East_Asian_Width.
- Emoji Characters that are emoji.
- EmojiComponent Characters used in emoji sequences that normally do not appear on emoji keyboards as separate choices, such as base characters for emoji keycaps.
- EmojiModifier Characters that are emoji modifiers.
- EmojiModifierBase Characters that can serve as a base for emoji modifiers.
- EmojiPresentation Characters that have emoji presentation by default.
- ExtendedPictographic Pictographic symbols, as well as reserved ranges in blocks largely associated with emoji characters
- Extender Characters whose principal function is to extend the value of a preceding alphabetic character or to extend the shape of adjacent characters.
- FullCompositionExclusion Characters that are excluded from composition.
- GeneralCategoryGroup Groupings of multiple General_Category property values.
-
GeneralCategoryOutOfBoundsError
Error value for
impl TryFrom<u8> for GeneralCategory. - Graph Invisible characters.
- GraphemeBase Property used together with the definition of Standard Korean Syllable Block to define "Grapheme base".
- GraphemeClusterBreak Enumerated property Grapheme_Cluster_Break.
- GraphemeExtend Property used to define "Grapheme extender".
- GraphemeLink Deprecated property.
- HangulSyllableType Enumerated property Hangul_Syllable_Type
- HexDigit Characters commonly used for the representation of hexadecimal numbers, plus their compatibility equivalents.
- Hyphen Deprecated property.
- IdContinue Characters that can come after the first character in an identifier.
- IdStart Characters that can begin an identifier.
- Ideographic Characters considered to be CJKV (Chinese, Japanese, Korean, and Vietnamese) ideographs, or related siniform ideographs
- IdsBinaryOperator Characters used in Ideographic Description Sequences.
- IdsTrinaryOperator Characters used in Ideographic Description Sequences.
- IndicSyllabicCategory Property Indic_Syllabic_Category. See UAX #44: https://www.unicode.org/reports/tr44/#Indic_Syllabic_Category.
- JoinControl Format control characters which have specific functions for control of cursive joining and ligation.
- JoiningType Enumerated property Joining_Type.
- LineBreak Enumerated property Line_Break.
- LogicalOrderException A small number of spacing vowel letters occurring in certain Southeast Asian scripts such as Thai and Lao.
- Lowercase Lowercase characters.
- Math Characters used in mathematical notation.
- NfcInert Characters that are inert under NFC, i.e., they do not interact with adjacent characters.
- NfdInert Characters that are inert under NFD, i.e., they do not interact with adjacent characters.
- NfkcInert Characters that are inert under NFKC, i.e., they do not interact with adjacent characters.
- NfkdInert Characters that are inert under NFKD, i.e., they do not interact with adjacent characters.
- NoncharacterCodePoint Code points permanently reserved for internal use.
- PatternSyntax Characters used as syntax in patterns (such as regular expressions).
- PatternWhiteSpace Characters used as whitespace in patterns (such as regular expressions).
- PrependedConcatenationMark A small class of visible format controls, which precede and then span a sequence of other characters, usually digits.
- Print Printable characters (visible characters and whitespace).
- QuotationMark Punctuation characters that function as quotation marks.
- Radical Characters used in the definition of Ideographic Description Sequences.
-
RegionalIndicator
Regional indicator characters,
U+1F1E6..U+1F1FF. - Script Enumerated property Script.
- SegmentStarter Characters that are starters in terms of Unicode normalization and combining character sequences.
- SentenceBreak Enumerated property Sentence_Break.
- SentenceTerminal Punctuation characters that generally mark the end of sentences.
- SoftDotted Characters with a "soft dot", like i or j.
- TerminalPunctuation Punctuation characters that generally mark the end of textual units.
- UnifiedIdeograph A property which specifies the exact set of Unified CJK Ideographs in the standard.
- Uppercase Uppercase characters.
- VariationSelector Characters that are Variation Selectors.
- VerticalOrientation Property Vertical_Orientation
- WhiteSpace Spaces, separator characters and other control characters which should be treated by programming languages as "white space" for the purpose of parsing elements.
- WordBreak Enumerated property Word_Break.
- Xdigit Hexadecimal digits
- XidContinue Characters that can come after the first character in an identifier.
- XidStart Characters that can begin an identifier.
Enums
- BidiPairedBracketType The enum represents Bidi_Paired_Bracket_Type.
- GeneralCategory Enumerated property General_Category.
Traits
- BinaryProperty A binary Unicode character property.
-
EmojiSet
An Emoji set as defined by
Unicode Technical Standard #51. - EnumeratedProperty A Unicode character property that assigns a value to each code point.
- NamedEnumeratedProperty A property whose value names can be represented as strings.
- ParseableEnumeratedProperty A property whose value names can be parsed from strings.