Struct LanguageIdentifier

struct LanguageIdentifier { ... }

A core struct representing a Unicode BCP47 Language Identifier.

Ordering

This type deliberately does not implement Ord or PartialOrd because there are multiple possible orderings. Depending on your use case, two orderings are available:

  1. A string ordering, suitable for stable serialization: LanguageIdentifier::strict_cmp
  2. A struct ordering, suitable for use with a BTreeSet: LanguageIdentifier::total_cmp

See issue: https://github.com/unicode-org/icu4x/issues/1215

Parsing

Unicode recognizes three levels of standard conformance for any language identifier:

At the moment parsing normalizes a well-formed language identifier converting _ separators to - and adjusting casing to conform to the Unicode standard.

Any syntactically invalid subtags will cause the parsing to fail with an error.

This operation normalizes syntax to be well-formed. No legacy subtag replacements is performed. For validation and canonicalization, see LocaleCanonicalizer.

Examples

Simple example:

use icu::locale::{
    langid,
    subtags::{language, region},
};

let li = langid!("en-US");

assert_eq!(li.language, language!("en"));
assert_eq!(li.script, None);
assert_eq!(li.region, Some(region!("US")));
assert_eq!(li.variants.len(), 0);

More complex example:

use icu::locale::{
    langid,
    subtags::{language, region, script, variant},
};

let li = langid!("eN-latn-Us-Valencia");

assert_eq!(li.language, language!("en"));
assert_eq!(li.script, Some(script!("Latn")));
assert_eq!(li.region, Some(region!("US")));
assert_eq!(li.variants.get(0), Some(&variant!("valencia")));

Fields

language: Language

Language subtag of the language identifier.

script: Option<Script>

Script subtag of the language identifier.

region: Option<Region>

Region subtag of the language identifier.

variants: Variants

Variant subtags of the language identifier.

Implementations

impl LanguageIdentifier

fn to_string(self: &Self) -> String

Converts the given value to a String.

Under the hood, this uses an efficient Writeable implementation. However, in order to avoid allocating a string, it is more efficient to use Writeable directly.

impl LanguageIdentifier

const fn is_unknown(self: &Self) -> bool

Whether this LanguageIdentifier equals LanguageIdentifier::UNKNOWN.

fn strict_cmp(self: &Self, other: &[u8]) -> Ordering

Compare this LanguageIdentifier with BCP-47 bytes.

The return value is equivalent to what would happen if you first converted this LanguageIdentifier to a BCP-47 string and then performed a byte comparison.

This function is case-sensitive and results in a total order, so it is appropriate for binary search. The only argument producing Ordering::Equal is self.to_string().

Examples

Sorting a list of langids with this method requires converting one of them to a string:

use icu::locale::LanguageIdentifier;
use std::cmp::Ordering;
use writeable::Writeable;

// Random input order:
let bcp47_strings: &[&str] = &[
    "ar-Latn",
    "zh-Hant-TW",
    "zh-TW",
    "und-fonipa",
    "zh-Hant",
    "ar-SA",
];

let mut langids = bcp47_strings
    .iter()
    .map(|s| s.parse().unwrap())
    .collect::<Vec<LanguageIdentifier>>();
langids.sort_by(|a, b| {
    let b = b.write_to_string();
    a.strict_cmp(b.as_bytes())
});
let strict_cmp_strings = langids
    .iter()
    .map(|l| l.to_string())
    .collect::<Vec<String>>();

// Output ordering, sorted alphabetically
let expected_ordering: &[&str] = &[
    "ar-Latn",
    "ar-SA",
    "und-fonipa",
    "zh-Hant",
    "zh-Hant-TW",
    "zh-TW",
];

assert_eq!(expected_ordering, strict_cmp_strings);
fn total_cmp(self: &Self, other: &Self) -> Ordering

Compare this LanguageIdentifier with another LanguageIdentifier field-by-field. The result is a total ordering sufficient for use in a BTreeSet.

Unlike LanguageIdentifier::strict_cmp, the ordering may or may not be equivalent to string ordering, and it may or may not be stable across ICU4X releases.

Examples

This method returns a nonsensical ordering derived from the fields of the struct:

use icu::locale::LanguageIdentifier;
use std::cmp::Ordering;

// Input strings, sorted alphabetically
let bcp47_strings: &[&str] = &[
    "ar-Latn",
    "ar-SA",
    "und-fonipa",
    "zh-Hant",
    "zh-Hant-TW",
    "zh-TW",
];
assert!(bcp47_strings.windows(2).all(|w| w[0] < w[1]));

let mut langids = bcp47_strings
    .iter()
    .map(|s| s.parse().unwrap())
    .collect::<Vec<LanguageIdentifier>>();
langids.sort_by(LanguageIdentifier::total_cmp);
let total_cmp_strings = langids
    .iter()
    .map(|l| l.to_string())
    .collect::<Vec<String>>();

// Output ordering, sorted arbitrarily
let expected_ordering: &[&str] = &[
    "ar-SA",
    "ar-Latn",
    "und-fonipa",
    "zh-TW",
    "zh-Hant",
    "zh-Hant-TW",
];

assert_eq!(expected_ordering, total_cmp_strings);

Use a wrapper to add a LanguageIdentifier to a BTreeSet:

use icu::locale::LanguageIdentifier;
use std::cmp::Ordering;
use std::collections::BTreeSet;

#[derive(PartialEq, Eq)]
struct LanguageIdentifierTotalOrd(LanguageIdentifier);

impl Ord for LanguageIdentifierTotalOrd {
    fn cmp(&self, other: &Self) -> Ordering {
        self.0.total_cmp(&other.0)
    }
}

impl PartialOrd for LanguageIdentifierTotalOrd {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        Some(self.cmp(other))
    }
}

let _: BTreeSet<LanguageIdentifierTotalOrd> = unimplemented!();
fn normalizing_eq(self: &Self, other: &str) -> bool

Compare this LanguageIdentifier with a potentially unnormalized BCP-47 string.

The return value is equivalent to what would happen if you first parsed the BCP-47 string to a LanguageIdentifier and then performed a structural comparison.

Examples

use icu::locale::LanguageIdentifier;

let bcp47_strings: &[&str] = &[
    "pl-LaTn-pL",
    "uNd",
    "UnD-adlm",
    "uNd-GB",
    "UND-FONIPA",
    "ZH",
];

for a in bcp47_strings {
    assert!(a.parse::<LanguageIdentifier>().unwrap().normalizing_eq(a));
}

impl Clone for LanguageIdentifier

fn clone(self: &Self) -> LanguageIdentifier

impl Debug for LanguageIdentifier

fn fmt(self: &Self, f: &mut Formatter<'_>) -> Result

impl Display for LanguageIdentifier

fn fmt(self: &Self, f: &mut Formatter<'_>) -> Result

impl Eq for LanguageIdentifier

impl Freeze for LanguageIdentifier

impl From for LanguageIdentifier

fn from(loc: Locale) -> Self

impl From for LanguageIdentifier

fn from(region: Option<Region>) -> Self

impl From for LanguageIdentifier

fn from(script: Option<Script>) -> Self

impl From for LanguageIdentifier

fn from(language: Language) -> Self

impl From for LanguageIdentifier

fn from(lsr: (Language, Option<Script>, Option<Region>)) -> Self

impl Hash for LanguageIdentifier

fn hash<__H: $crate::hash::Hasher>(self: &Self, state: &mut __H)

impl PartialEq for LanguageIdentifier

fn eq(self: &Self, other: &LanguageIdentifier) -> bool

impl RefUnwindSafe for LanguageIdentifier

impl Send for LanguageIdentifier

impl StructuralPartialEq for LanguageIdentifier

impl Sync for LanguageIdentifier

impl Unpin for LanguageIdentifier

impl UnsafeUnpin for LanguageIdentifier

impl UnwindSafe for LanguageIdentifier

impl Writeable for LanguageIdentifier

fn write_to<W: core::fmt::Write + ?Sized>(self: &Self, sink: &mut W) -> Result
fn writeable_length_hint(self: &Self) -> LengthHint

impl<T> Any for LanguageIdentifier

fn type_id(self: &Self) -> TypeId

impl<T> Borrow for LanguageIdentifier

fn borrow(self: &Self) -> &T

impl<T> BorrowMut for LanguageIdentifier

fn borrow_mut(self: &mut Self) -> &mut T

impl<T> CloneToUninit for LanguageIdentifier

unsafe fn clone_to_uninit(self: &Self, dest: *mut u8)

impl<T> ErasedDestructor for LanguageIdentifier

impl<T> From for LanguageIdentifier

fn from(t: T) -> T

Returns the argument unchanged.

impl<T> ToOwned for LanguageIdentifier

fn to_owned(self: &Self) -> T
fn clone_into(self: &Self, target: &mut T)

impl<T> ToString for LanguageIdentifier

fn to_string(self: &Self) -> String

impl<T, U> Into for LanguageIdentifier

fn into(self: Self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of [From]<T> for U chooses to do.

impl<T, U> TryFrom for LanguageIdentifier

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto for LanguageIdentifier

fn try_into(self: Self) -> Result<U, <U as TryFrom<T>>::Error>