Tizen Native API  6.5
UChar Iterator

The UChar Iterator module provides API for code unit iteration.

Required Header

#include <utils_i18n.h>

Overview

C API for code unit iteration. This can be implemented using simple strings, etc. The current() and next() functions only check the current index against the limit, and previous() only checks the current index against the start, to see if the iterator already reached the end of the iteration range. The assumption - in all iterators - is that the index is moved via the API, which means it won't go out of bounds, or the index is modified by user code that knows enough about the iterator implementation to set valid index values. UCharIterator functions return code unit values 0..0xffff. Before any functions operating on strings are called, the string must be set with i18n_uchar_iter_set_string(), i18n_uchar_iter_set_UTF16BE() or i18n_uchar_iter_set_UTF8().

Functions

int i18n_uchar_iter_create (i18n_uchar_iter_h *iter)
 Creates an i18n_uchar_iter_h object.
int i18n_uchar_iter_destroy (i18n_uchar_iter_h iter)
 Deletes an i18n_uchar_iter_h object.
int i18n_uchar_iter_set_string (i18n_uchar_iter_h iter, const i18n_uchar *s, int32_t length)
 Sets up an i18n_uchar_iter_h to iterate over a string.
int i18n_uchar_iter_set_utf16be (i18n_uchar_iter_h iter, const char *s, int32_t length)
 Sets up an i18n_uchar_iter_h to iterate over a UTF-16BE string (byte vector with a big-endian pair of bytes per i18n_uchar).
int i18n_uchar_iter_set_utf8 (i18n_uchar_iter_h iter, const char *s, int32_t length)
 Sets up an i18n_uchar_iter_h to iterate over a UTF-8 string.
int i18n_uchar_iter_get_index (i18n_uchar_iter_h iter, i18n_uchar_iter_origin_e origin, int32_t *index)
 Gets the current position, or the start or limit of the iteration range.
int i18n_uchar_iter_move (i18n_uchar_iter_h iter, int32_t delta, i18n_uchar_iter_origin_e origin, int32_t *new_index)
 Moves the current position relative to the start or limit of the iteration range, or relative to the current position itself. The movement is expressed in numbers of code units forward or backward by specifying a positive or negative delta. Out of bounds movement will be pinned to the start or limit.
int i18n_uchar_iter_has_next (i18n_uchar_iter_h iter, bool *has_next)
 Checks if i18n_uchar_iter_current() and i18n_uchar_iter_next() can still return another code unit.
int i18n_uchar_iter_has_previous (i18n_uchar_iter_h iter, bool *has_previous)
 Checks if i18n_uchar_iter_previous() can still return another code unit.
int i18n_uchar_iter_current (i18n_uchar_iter_h iter, i18n_uchar32 *current)
 Returns the code unit at the current position, or I18N_SENTINEL if there is none (index is at the limit).
int i18n_uchar_iter_next (i18n_uchar_iter_h iter, i18n_uchar32 *current)
 Returns the code unit at the current index and increments the index (post-increment, like s[i++]), or returns I18N_SENTINEL if there is none (index is at the limit).
int i18n_uchar_iter_previous (i18n_uchar_iter_h iter, i18n_uchar32 *previous)
 Decrements the index and returns the code unit from there (pre-decrement, like s[--i]), or returns I18N_SENTINEL if there is none (index is at the start).
int i18n_uchar_iter_get_state (const i18n_uchar_iter_h iter, uint32_t *state)
 Gets the "state" of the iterator in the form of a single 32-bit word.
int i18n_uchar_iter_set_state (i18n_uchar_iter_h iter, uint32_t state)
 Restores the "state" of the iterator using a state word from a i18n_uchar_iter_get_state() call. The iterator object need not be the same one as for which i18n_uchar_iter_get_state() was called, but it must be of the same type (set up using the same i18n_uchar_iter_set_* function) and it must iterate over the same string (binary identical regardless of memory address).

Typedefs

typedef void * i18n_uchar_iter_h
 An i18n_uchar_iter_h handle.

Defines

#define I18N_UCHAR_ITER_UNKNOWN_INDEX   -2
 Constant value that may be returned by i18n_uchar_iter_move() indicating that the final UTF-16 index is not known, but that the move succeeded.
#define I18N_UCHAR_ITER_NO_STATE   ((uint32_t) 0xffffffff)
 Constant that refers to an error or an unknown state.

Define Documentation

#define I18N_UCHAR_ITER_NO_STATE   ((uint32_t) 0xffffffff)

Constant that refers to an error or an unknown state.

Since :
4.0

Constant value that may be returned by i18n_uchar_iter_move() indicating that the final UTF-16 index is not known, but that the move succeeded.

This can occur when moving relative to limit or length, or when moving relative to the current index after an i18n_uchar_iter_set_state() call when the current UTF-16 index is not known. It would be very inefficient to have to count from the beginning of the text just to get the current/limit/length index after moving relative to it. The actual index can be determined by calling i18n_uchar_iter_get_index() with I18N_UCHAR_ITER_CURRENT, which will count the i18n_uchar characters if necessary.

Since :
4.0

Typedef Documentation

typedef void* i18n_uchar_iter_h

An i18n_uchar_iter_h handle.

Use i18n_uchar_iter_* functions to operate on i18n_uchar_iter_h objects.

Since :
4.0

Enumeration Type Documentation

Origin constants for i18n_uchar_iter_get_index() and i18n_uchar_iter_move().

Since :
4.0
Enumerator:
I18N_UCHAR_ITER_START 

The 'start' origin

I18N_UCHAR_ITER_CURRENT 

The 'current' origin

I18N_UCHAR_ITER_LIMIT 

The 'limit' origin

I18N_UCHAR_ITER_ZERO 

The 'zero' origin

I18N_UCHAR_ITER_LENGTH 

The 'length' origin


Function Documentation

Creates an i18n_uchar_iter_h object.

Since :
4.0
Remarks:
To delete this object call i18n_uchar_iter_destroy() function.
Parameters:
[out]iterThe i18n_uchar_iter_h handle
Returns:
0 on success, otherwise a negative error value
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
I18N_ERROR_OUT_OF_MEMORYOut of memory

Returns the code unit at the current position, or I18N_SENTINEL if there is none (index is at the limit).

Since :
4.0
Parameters:
[in]iterThe i18n_uchar_iter_h object
[out]currentThe current code unit
Returns:
0 on success, otherwise a negative error value
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
Precondition:
The string must be set with one of: i18n_uchar_iter_set_string(), i18n_uchar_iter_set_UTF16BE(), i18n_uchar_iter_set_UTF8().

Deletes an i18n_uchar_iter_h object.

Since :
4.0
Parameters:
[in]iterThe i18n_uchar_iter_h object
Returns:
0 on success, otherwise a negative error value
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
int i18n_uchar_iter_get_index ( i18n_uchar_iter_h  iter,
i18n_uchar_iter_origin_e  origin,
int32_t *  index 
)

Gets the current position, or the start or limit of the iteration range.

This function may perform slowly for I18N_UCHAR_ITER_CURRENT after i18n_uchar_iter_set_state() was called, or for I18N_UCHAR_ITER_LENGTH, because an iterator implementation may have to count UChars if the underlying storage is not UTF-16.

Since :
4.0
Parameters:
[in]iterThe i18n_uchar_iter_h object
[in]originOrigin defining action to perform
[out]indexThe requested index, or I18N_SENTINEL in an error condition
Returns:
0 on success, otherwise a negative error value
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
Precondition:
The string must be set with one of: i18n_uchar_iter_set_string(), i18n_uchar_iter_set_UTF16BE(), i18n_uchar_iter_set_UTF8().
int i18n_uchar_iter_get_state ( const i18n_uchar_iter_h  iter,
uint32_t *  state 
)

Gets the "state" of the iterator in the form of a single 32-bit word.

It is recommended that the state value be calculated to be as small as is feasible. For strings with limited lengths, fewer than 32 bits may be sufficient.

This is used together with i18n_uchar_iter_set_state() to save and restore the iterator position more efficiently than with i18n_uchar_iter_get_index() or i18n_uchar_iter_move().

The iterator state is defined as a uint32_t value because it is designed for use in i18n_ucol_next_sort_key_part() which provides 32 bits to store the state of the character iterator.

With some UCharIterator implementations (e.g., UTF-8), getting and setting the UTF-16 index with existing functions (i18n_uchar_iter_get_index(I18N_UCHAR_ITER_CURRENT) followed by i18n_uchar_iter_move(pos, I18N_UCHAR_ITER_ZERO)) is possible but relatively slow because the iterator has to "walk" from a known index to the requested one. This takes more time the farther it needs to go.

An opaque state value allows an iterator implementation to provide an internal index (UTF-8: the source byte array index) for fast, constant-time restoration.

After calling i18n_uchar_iter_set_state(), i18n_uchar_iter_get_index(I18N_UCHAR_ITER_CURRENT) calls may be slow because the UTF-16 index may not be restored as well, but the iterator can deliver the correct text contents and move relative to the current position without performance degradation.

Some UCharIterator implementations may not be able to return a valid state for each position, in which case they return I18N_UCHAR_ITER_NO_STATE instead.

Since :
4.0
Parameters:
[in]iterThe i18n_uchar_iter_h object
[out]stateThe state word
Returns:
0 on success, otherwise a negative error value
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
Precondition:
The string must be set with one of: i18n_uchar_iter_set_string(), i18n_uchar_iter_set_UTF16BE(), i18n_uchar_iter_set_UTF8().
int i18n_uchar_iter_has_next ( i18n_uchar_iter_h  iter,
bool *  has_next 
)

Checks if i18n_uchar_iter_current() and i18n_uchar_iter_next() can still return another code unit.

Since :
4.0
Parameters:
[in]iterThe i18n_uchar_iter_h object
[out]has_nexttrue if another code unit can be returned, false otherwise
Returns:
0 on success, otherwise a negative error value
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
Precondition:
The string must be set with one of: i18n_uchar_iter_set_string(), i18n_uchar_iter_set_UTF16BE(), i18n_uchar_iter_set_UTF8().
int i18n_uchar_iter_has_previous ( i18n_uchar_iter_h  iter,
bool *  has_previous 
)

Checks if i18n_uchar_iter_previous() can still return another code unit.

Since :
4.0
Parameters:
[in]iterThe i18n_uchar_iter_h object
[out]has_previoustrue if another code unit can be returned, false otherwise
Returns:
0 on success, otherwise a negative error value
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
Precondition:
The string must be set with one of: i18n_uchar_iter_set_string(), i18n_uchar_iter_set_UTF16BE(), i18n_uchar_iter_set_UTF8().
int i18n_uchar_iter_move ( i18n_uchar_iter_h  iter,
int32_t  delta,
i18n_uchar_iter_origin_e  origin,
int32_t *  new_index 
)

Moves the current position relative to the start or limit of the iteration range, or relative to the current position itself. The movement is expressed in numbers of code units forward or backward by specifying a positive or negative delta. Out of bounds movement will be pinned to the start or limit.

This function may perform slowly for moving relative to I18N_UCHAR_ITER_LENGTH because an iterator implementation may have to count the rest of the UChars if the native storage is not UTF-16. When moving relative to the limit or length, or relative to the current position after i18n_uchar_iter_set_state() was called, i18n_uchar_iter_move() may return I18N_UCHAR_ITER_UNKNOWN_INDEX to avoid an inefficient determination of the actual UTF-16 index. The actual index can be determined with i18n_uchar_iter_get_index(I18N_UCHAR_ITER_CURRENT) which will count the UChars if necessary.

Since :
4.0
Parameters:
[in]iterThe i18n_uchar_iter_h object
[in]deltaMovement
[in]originOrigin defining action to perform
[out]new_indexThe new index or I18N_UCHAR_ITER_UNKNOWN_INDEX when the index is not known, or I18N_SENTINEL on an error condition
Returns:
0 on success, otherwise a negative error value
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
Precondition:
The string must be set with one of: i18n_uchar_iter_set_string(), i18n_uchar_iter_set_UTF16BE(), i18n_uchar_iter_set_UTF8().
int i18n_uchar_iter_next ( i18n_uchar_iter_h  iter,
i18n_uchar32 current 
)

Returns the code unit at the current index and increments the index (post-increment, like s[i++]), or returns I18N_SENTINEL if there is none (index is at the limit).

Since :
4.0
Parameters:
[in]iterThe i18n_uchar_iter_h object
[out]currentThe current code unit
Returns:
0 on success, otherwise a negative error value
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
Precondition:
The string must be set with one of: i18n_uchar_iter_set_string(), i18n_uchar_iter_set_UTF16BE(), i18n_uchar_iter_set_UTF8().

Decrements the index and returns the code unit from there (pre-decrement, like s[--i]), or returns I18N_SENTINEL if there is none (index is at the start).

Since :
4.0
Parameters:
[in]iterThe i18n_uchar_iter_h object
[out]previousThe previous code unit
Returns:
0 on success, otherwise a negative error value
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
Precondition:
The string must be set with one of: i18n_uchar_iter_set_string(), i18n_uchar_iter_set_UTF16BE(), i18n_uchar_iter_set_UTF8().
int i18n_uchar_iter_set_state ( i18n_uchar_iter_h  iter,
uint32_t  state 
)

Restores the "state" of the iterator using a state word from a i18n_uchar_iter_get_state() call. The iterator object need not be the same one as for which i18n_uchar_iter_get_state() was called, but it must be of the same type (set up using the same i18n_uchar_iter_set_* function) and it must iterate over the same string (binary identical regardless of memory address).

After calling i18n_uchar_iter_set_state(), an i18n_uchar_iter_get_index(I18N_UCHAR_ITER_CURRENT) may be slow because the UTF-16 index may not be restored as well, but the iterator can deliver the correct text contents and move relative to the current position without performance degradation.

Since :
4.0
Parameters:
[in]iterThe i18n_uchar_iter_h object
[in]stateThe state word from an i18n_uchar_iter_get_state() call on a same-type, same-string iterator
Returns:
0 on success, otherwise a negative error value
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
Precondition:
The string must be set with one of: i18n_uchar_iter_set_string(), i18n_uchar_iter_set_UTF16BE(), i18n_uchar_iter_set_UTF8().
int i18n_uchar_iter_set_string ( i18n_uchar_iter_h  iter,
const i18n_uchar s,
int32_t  length 
)

Sets up an i18n_uchar_iter_h to iterate over a string.

Sets the i18n_uchar_iter_h function pointers for iteration over the string s with iteration boundaries (start == index == 0) and (length == limit == string length). The "provider" may set the start, index, and limit values at any time within the range 0..length.

Since :
4.0
Remarks:
The string s will not be copied or reallocated.
Parameters:
[in]iterThe i18n_uchar_iter_h structure to be set for iteration
[in]sString to iterate over
[in]lengthLength of s, or -1 if NULL-terminated
Returns:
0 on success, otherwise a negative error value
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
int i18n_uchar_iter_set_utf16be ( i18n_uchar_iter_h  iter,
const char *  s,
int32_t  length 
)

Sets up an i18n_uchar_iter_h to iterate over a UTF-16BE string (byte vector with a big-endian pair of bytes per i18n_uchar).

Everything works just like with a normal i18n_uchar iterator, except that i18n_uchar characters are assembled from byte pairs, and that the length argument here indicates an even number of bytes.

Since :
4.0
Parameters:
[in]iteri18n_uchar_iter_h structure to be set for iteration
[in]sUTF-16BE string to iterate over
[in]lengthLength of s as an even number of bytes, or -1 if NULL-terminated (NULL means pair of 0 bytes at even index from s)
Returns:
0 on success, otherwise a negative error value
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
int i18n_uchar_iter_set_utf8 ( i18n_uchar_iter_h  iter,
const char *  s,
int32_t  length 
)

Sets up an i18n_uchar_iter_h to iterate over a UTF-8 string.

Sets the i18n_uchar_iter_h function pointers for iteration over the UTF-8 string s with UTF-8 iteration boundaries 0 and length. The implementation counts the UTF-16 index on the fly and lazily evaluates the UTF-16 length of the text.

Since :
4.0
Remarks:
The string s will not be copied or reallocated. i18n_uchar_iter_get_state() returns a state value consisting of the current UTF-8 source byte index (bits 31..1) a flag (bit 0) that indicates whether the UChar position is in the middle of a surrogate pair (from a 4-byte UTF-8 sequence for the corresponding supplementary code point). i18n_uchar_iter_get_state() cannot also encode the UTF-16 index in the state value.
Parameters:
[in]iteri18n_uchar_iter_h structure to be set for iteration
[in]sUTF-8 string to iterate over
[in]lengthLength of s, or -1 if NULL-terminated
Returns:
0 on success, otherwise a negative error value
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
Precondition:
The string must be set with one of: i18n_uchar_iter_set_string(), i18n_uchar_iter_set_UTF16BE(), i18n_uchar_iter_set_UTF8().