Struct GraphemeCursor

struct GraphemeCursor { ... }

Cursor-based segmenter for grapheme clusters.

This allows working with ropes and other datastructures where the string is not contiguous or fully known at initialization time.

Implementations

impl GraphemeCursor

fn new(offset: usize, len: usize, is_extended: bool) -> GraphemeCursor

Create a new cursor. The string and initial offset are given at creation time, but the contents of the string are not. The is_extended parameter controls whether extended grapheme clusters are selected.

The offset parameter must be on a codepoint boundary.

# use unicode_segmentation::GraphemeCursor;
let s = "हिन्दी";
let mut legacy = GraphemeCursor::new(0, s.len(), false);
assert_eq!(legacy.next_boundary(s, 0), Ok(Some("".len())));
let mut extended = GraphemeCursor::new(0, s.len(), true);
assert_eq!(extended.next_boundary(s, 0), Ok(Some("हि".len())));
fn set_cursor(self: &mut Self, offset: usize)

Set the cursor to a new location in the same string.

# use unicode_segmentation::GraphemeCursor;
let s = "abcd";
let mut cursor = GraphemeCursor::new(0, s.len(), false);
assert_eq!(cursor.cur_cursor(), 0);
cursor.set_cursor(2);
assert_eq!(cursor.cur_cursor(), 2);
fn cur_cursor(self: &Self) -> usize

The current offset of the cursor. Equal to the last value provided to new() or set_cursor(), or returned from next_boundary() or prev_boundary().

# use unicode_segmentation::GraphemeCursor;
// Two flags (🇷🇸🇮🇴), each flag is two RIS codepoints, each RIS is 4 bytes.
let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(4, flags.len(), false);
assert_eq!(cursor.cur_cursor(), 4);
assert_eq!(cursor.next_boundary(flags, 0), Ok(Some(8)));
assert_eq!(cursor.cur_cursor(), 8);
fn provide_context(self: &mut Self, chunk: &str, chunk_start: usize)

Provide additional pre-context when it is needed to decide a boundary. The end of the chunk must coincide with the value given in the GraphemeIncomplete::PreContext request.

# use unicode_segmentation::{GraphemeCursor, GraphemeIncomplete};
let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(8, flags.len(), false);
// Not enough pre-context to decide if there's a boundary between the two flags.
assert_eq!(cursor.is_boundary(&flags[8..], 8), Err(GraphemeIncomplete::PreContext(8)));
// Provide one more Regional Indicator Symbol of pre-context
cursor.provide_context(&flags[4..8], 4);
// Still not enough context to decide.
assert_eq!(cursor.is_boundary(&flags[8..], 8), Err(GraphemeIncomplete::PreContext(4)));
// Provide additional requested context.
cursor.provide_context(&flags[0..4], 0);
// That's enough to decide (it always is when context goes to the start of the string)
assert_eq!(cursor.is_boundary(&flags[8..], 8), Ok(true));
fn is_boundary(self: &mut Self, chunk: &str, chunk_start: usize) -> Result<bool, GraphemeIncomplete>

Determine whether the current cursor location is a grapheme cluster boundary. Only a part of the string need be supplied. If chunk_start is nonzero or the length of chunk is not equal to len on creation, then this method may return GraphemeIncomplete::PreContext. The caller should then call provide_context with the requested chunk, then retry calling this method.

For partial chunks, if the cursor is not at the beginning or end of the string, the chunk should contain at least the codepoint following the cursor. If the string is nonempty, the chunk must be nonempty.

All calls should have consistent chunk contents (ie, if a chunk provides content for a given slice, all further chunks covering that slice must have the same content for it).

# use unicode_segmentation::GraphemeCursor;
let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(8, flags.len(), false);
assert_eq!(cursor.is_boundary(flags, 0), Ok(true));
cursor.set_cursor(12);
assert_eq!(cursor.is_boundary(flags, 0), Ok(false));
fn next_boundary(self: &mut Self, chunk: &str, chunk_start: usize) -> Result<Option<usize>, GraphemeIncomplete>

Find the next boundary after the current cursor position. Only a part of the string need be supplied. If the chunk is incomplete, then this method might return GraphemeIncomplete::PreContext or GraphemeIncomplete::NextChunk. In the former case, the caller should call provide_context with the requested chunk, then retry. In the latter case, the caller should provide the chunk following the one given, then retry.

See is_boundary for expectations on the provided chunk.

# use unicode_segmentation::GraphemeCursor;
let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(4, flags.len(), false);
assert_eq!(cursor.next_boundary(flags, 0), Ok(Some(8)));
assert_eq!(cursor.next_boundary(flags, 0), Ok(Some(16)));
assert_eq!(cursor.next_boundary(flags, 0), Ok(None));

And an example that uses partial strings:

# use unicode_segmentation::{GraphemeCursor, GraphemeIncomplete};
let s = "abcd";
let mut cursor = GraphemeCursor::new(0, s.len(), false);
assert_eq!(cursor.next_boundary(&s[..2], 0), Ok(Some(1)));
assert_eq!(cursor.next_boundary(&s[..2], 0), Err(GraphemeIncomplete::NextChunk));
assert_eq!(cursor.next_boundary(&s[2..4], 2), Ok(Some(2)));
assert_eq!(cursor.next_boundary(&s[2..4], 2), Ok(Some(3)));
assert_eq!(cursor.next_boundary(&s[2..4], 2), Ok(Some(4)));
assert_eq!(cursor.next_boundary(&s[2..4], 2), Ok(None));
fn prev_boundary(self: &mut Self, chunk: &str, chunk_start: usize) -> Result<Option<usize>, GraphemeIncomplete>

Find the previous boundary after the current cursor position. Only a part of the string need be supplied. If the chunk is incomplete, then this method might return GraphemeIncomplete::PreContext or GraphemeIncomplete::PrevChunk. In the former case, the caller should call provide_context with the requested chunk, then retry. In the latter case, the caller should provide the chunk preceding the one given, then retry.

See is_boundary for expectations on the provided chunk.

# use unicode_segmentation::GraphemeCursor;
let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(12, flags.len(), false);
assert_eq!(cursor.prev_boundary(flags, 0), Ok(Some(8)));
assert_eq!(cursor.prev_boundary(flags, 0), Ok(Some(0)));
assert_eq!(cursor.prev_boundary(flags, 0), Ok(None));

And an example that uses partial strings (note the exact return is not guaranteed, and may be PrevChunk or PreContext arbitrarily):

# use unicode_segmentation::{GraphemeCursor, GraphemeIncomplete};
let s = "abcd";
let mut cursor = GraphemeCursor::new(4, s.len(), false);
assert_eq!(cursor.prev_boundary(&s[2..4], 2), Ok(Some(3)));
assert_eq!(cursor.prev_boundary(&s[2..4], 2), Err(GraphemeIncomplete::PrevChunk));
assert_eq!(cursor.prev_boundary(&s[0..2], 0), Ok(Some(2)));
assert_eq!(cursor.prev_boundary(&s[0..2], 0), Ok(Some(1)));
assert_eq!(cursor.prev_boundary(&s[0..2], 0), Ok(Some(0)));
assert_eq!(cursor.prev_boundary(&s[0..2], 0), Ok(None));

impl Clone for GraphemeCursor

fn clone(self: &Self) -> GraphemeCursor

impl Debug for GraphemeCursor

fn fmt(self: &Self, f: &mut Formatter<'_>) -> Result

impl Freeze for GraphemeCursor

impl RefUnwindSafe for GraphemeCursor

impl Send for GraphemeCursor

impl Sync for GraphemeCursor

impl Unpin for GraphemeCursor

impl UnsafeUnpin for GraphemeCursor

impl UnwindSafe for GraphemeCursor

impl<T> Any for GraphemeCursor

fn type_id(self: &Self) -> TypeId

impl<T> Borrow for GraphemeCursor

fn borrow(self: &Self) -> &T

impl<T> BorrowMut for GraphemeCursor

fn borrow_mut(self: &mut Self) -> &mut T

impl<T> CloneToUninit for GraphemeCursor

unsafe fn clone_to_uninit(self: &Self, dest: *mut u8)

impl<T> From for GraphemeCursor

fn from(t: T) -> T

Returns the argument unchanged.

impl<T, U> Into for GraphemeCursor

fn into(self: Self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of [From]<T> for U chooses to do.

impl<T, U> TryFrom for GraphemeCursor

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto for GraphemeCursor

fn try_into(self: Self) -> Result<U, <U as TryFrom<T>>::Error>