Tizen Native API
4.0
|
The UChar Iterator module provides API for code unit iteration.
#include <utils_i18n.h>
C API for code unit iteration. This can be implemented using simple strings, etc. The current() and next() functions only check the current index against the limit, and previous() only checks the current index against the start, to see if the iterator already reached the end of the iteration range. The assumption - in all iterators - is that the index is moved via the API, which means it won't go out of bounds, or the index is modified by user code that knows enough about the iterator implementation to set valid index values. UCharIterator functions return code unit values 0..0xffff. Before any functions operating on strings are called, the string must be set with i18n_uchar_iter_set_string(), i18n_uchar_iter_set_UTF16BE() or i18n_uchar_iter_set_UTF8().
Functions | |
int | i18n_uchar_iter_create (i18n_uchar_iter_h *iter) |
Creates an i18n_uchar_iter_h object. | |
int | i18n_uchar_iter_destroy (i18n_uchar_iter_h iter) |
Deletes an i18n_uchar_iter_h object. | |
int | i18n_uchar_iter_set_string (i18n_uchar_iter_h iter, const i18n_uchar *s, int32_t length) |
Sets up an i18n_uchar_iter_h to iterate over a string. | |
int | i18n_uchar_iter_set_utf16be (i18n_uchar_iter_h iter, const char *s, int32_t length) |
Sets up an i18n_uchar_iter_h to iterate over a UTF-16BE string (byte vector with a big-endian pair of bytes per i18n_uchar). | |
int | i18n_uchar_iter_set_utf8 (i18n_uchar_iter_h iter, const char *s, int32_t length) |
Sets up an i18n_uchar_iter_h to iterate over a UTF-8 string. | |
int | i18n_uchar_iter_get_index (i18n_uchar_iter_h iter, i18n_uchar_iter_origin_e origin, int32_t *index) |
Gets the current position, or the start or limit of the iteration range. | |
int | i18n_uchar_iter_move (i18n_uchar_iter_h iter, int32_t delta, i18n_uchar_iter_origin_e origin, int32_t *new_index) |
Moves the current position relative to the start or limit of the iteration range, or relative to the current position itself. The movement is expressed in numbers of code units forward or backward by specifying a positive or negative delta. Out of bounds movement will be pinned to the start or limit. | |
int | i18n_uchar_iter_has_next (i18n_uchar_iter_h iter, bool *has_next) |
Checks if i18n_uchar_iter_current() and i18n_uchar_iter_next() can still return another code unit. | |
int | i18n_uchar_iter_has_previous (i18n_uchar_iter_h iter, bool *has_previous) |
Checks if i18n_uchar_iter_previous() can still return another code unit. | |
int | i18n_uchar_iter_current (i18n_uchar_iter_h iter, i18n_uchar32 *current) |
Returns the code unit at the current position, or I18N_SENTINEL if there is none (index is at the limit). | |
int | i18n_uchar_iter_next (i18n_uchar_iter_h iter, i18n_uchar32 *current) |
Returns the code unit at the current index and increments the index (post-increment, like s[i++]), or returns I18N_SENTINEL if there is none (index is at the limit). | |
int | i18n_uchar_iter_previous (i18n_uchar_iter_h iter, i18n_uchar32 *previous) |
Decrements the index and returns the code unit from there (pre-decrement, like s[--i]), or returns I18N_SENTINEL if there is none (index is at the start). | |
int | i18n_uchar_iter_get_state (const i18n_uchar_iter_h iter, uint32_t *state) |
Gets the "state" of the iterator in the form of a single 32-bit word. | |
int | i18n_uchar_iter_set_state (i18n_uchar_iter_h iter, uint32_t state) |
Restores the "state" of the iterator using a state word from a i18n_uchar_iter_get_state() call. The iterator object need not be the same one as for which i18n_uchar_iter_get_state() was called, but it must be of the same type (set up using the same i18n_uchar_iter_set_* function) and it must iterate over the same string (binary identical regardless of memory address). | |
Typedefs | |
typedef void * | i18n_uchar_iter_h |
An i18n_uchar_iter_h handle. | |
Defines | |
#define | I18N_UCHAR_ITER_UNKNOWN_INDEX -2 |
Constant value that may be returned by i18n_uchar_iter_move() indicating that the final UTF-16 index is not known, but that the move succeeded. | |
#define | I18N_UCHAR_ITER_NO_STATE ((uint32_t) 0xffffffff) |
Constant that refers to an error or an unknown state. |
#define I18N_UCHAR_ITER_NO_STATE ((uint32_t) 0xffffffff) |
Constant that refers to an error or an unknown state.
#define I18N_UCHAR_ITER_UNKNOWN_INDEX -2 |
Constant value that may be returned by i18n_uchar_iter_move() indicating that the final UTF-16 index is not known, but that the move succeeded.
This can occur when moving relative to limit or length, or when moving relative to the current index after an i18n_uchar_iter_set_state() call when the current UTF-16 index is not known. It would be very inefficient to have to count from the beginning of the text just to get the current/limit/length index after moving relative to it. The actual index can be determined by calling i18n_uchar_iter_get_index() with I18N_UCHAR_ITER_CURRENT, which will count the i18n_uchar characters if necessary.
typedef void* i18n_uchar_iter_h |
An i18n_uchar_iter_h handle.
Use i18n_uchar_iter_* functions to operate on i18n_uchar_iter_h objects.
Origin constants for i18n_uchar_iter_get_index() and i18n_uchar_iter_move().
int i18n_uchar_iter_create | ( | i18n_uchar_iter_h * | iter | ) |
Creates an i18n_uchar_iter_h object.
[out] | iter | The i18n_uchar_iter_h handle |
0
on success, otherwise a negative error value I18N_ERROR_NONE | Successful |
I18N_ERROR_INVALID_PARAMETER | Invalid function parameter |
I18N_ERROR_OUT_OF_MEMORY | Out of memory |
int i18n_uchar_iter_current | ( | i18n_uchar_iter_h | iter, |
i18n_uchar32 * | current | ||
) |
Returns the code unit at the current position, or I18N_SENTINEL if there is none (index is at the limit).
[in] | iter | The i18n_uchar_iter_h object |
[out] | current | The current code unit |
0
on success, otherwise a negative error value I18N_ERROR_NONE | Successful |
I18N_ERROR_INVALID_PARAMETER | Invalid function parameter |
int i18n_uchar_iter_destroy | ( | i18n_uchar_iter_h | iter | ) |
Deletes an i18n_uchar_iter_h object.
[in] | iter | The i18n_uchar_iter_h object |
0
on success, otherwise a negative error value I18N_ERROR_NONE | Successful |
I18N_ERROR_INVALID_PARAMETER | Invalid function parameter |
int i18n_uchar_iter_get_index | ( | i18n_uchar_iter_h | iter, |
i18n_uchar_iter_origin_e | origin, | ||
int32_t * | index | ||
) |
Gets the current position, or the start or limit of the iteration range.
This function may perform slowly for I18N_UCHAR_ITER_CURRENT after i18n_uchar_iter_set_state() was called, or for I18N_UCHAR_ITER_LENGTH, because an iterator implementation may have to count UChars if the underlying storage is not UTF-16.
[in] | iter | The i18n_uchar_iter_h object |
[in] | origin | Origin defining action to perform |
[out] | index | The requested index, or I18N_SENTINEL in an error condition |
0
on success, otherwise a negative error value I18N_ERROR_NONE | Successful |
I18N_ERROR_INVALID_PARAMETER | Invalid function parameter |
int i18n_uchar_iter_get_state | ( | const i18n_uchar_iter_h | iter, |
uint32_t * | state | ||
) |
Gets the "state" of the iterator in the form of a single 32-bit word.
It is recommended that the state value be calculated to be as small as is feasible. For strings with limited lengths, fewer than 32 bits may be sufficient.
This is used together with i18n_uchar_iter_set_state() to save and restore the iterator position more efficiently than with i18n_uchar_iter_get_index() or i18n_uchar_iter_move().
The iterator state is defined as a uint32_t
value because it is designed for use in i18n_ucol_next_sort_key_part() which provides 32 bits to store the state of the character iterator.
With some UCharIterator implementations (e.g., UTF-8), getting and setting the UTF-16 index with existing functions (i18n_uchar_iter_get_index(I18N_UCHAR_ITER_CURRENT) followed by i18n_uchar_iter_move(pos, I18N_UCHAR_ITER_ZERO)) is possible but relatively slow because the iterator has to "walk" from a known index to the requested one. This takes more time the farther it needs to go.
An opaque state value allows an iterator implementation to provide an internal index (UTF-8: the source byte array index) for fast, constant-time restoration.
After calling i18n_uchar_iter_set_state(), i18n_uchar_iter_get_index(I18N_UCHAR_ITER_CURRENT) calls may be slow because the UTF-16 index may not be restored as well, but the iterator can deliver the correct text contents and move relative to the current position without performance degradation.
Some UCharIterator implementations may not be able to return a valid state for each position, in which case they return I18N_UCHAR_ITER_NO_STATE instead.
[in] | iter | The i18n_uchar_iter_h object |
[out] | state | The state word |
0
on success, otherwise a negative error value I18N_ERROR_NONE | Successful |
I18N_ERROR_INVALID_PARAMETER | Invalid function parameter |
int i18n_uchar_iter_has_next | ( | i18n_uchar_iter_h | iter, |
bool * | has_next | ||
) |
Checks if i18n_uchar_iter_current() and i18n_uchar_iter_next() can still return another code unit.
[in] | iter | The i18n_uchar_iter_h object |
[out] | has_next | true if another code unit can be returned, false otherwise |
0
on success, otherwise a negative error value I18N_ERROR_NONE | Successful |
I18N_ERROR_INVALID_PARAMETER | Invalid function parameter |
int i18n_uchar_iter_has_previous | ( | i18n_uchar_iter_h | iter, |
bool * | has_previous | ||
) |
Checks if i18n_uchar_iter_previous() can still return another code unit.
[in] | iter | The i18n_uchar_iter_h object |
[out] | has_previous | true if another code unit can be returned, false otherwise |
0
on success, otherwise a negative error value I18N_ERROR_NONE | Successful |
I18N_ERROR_INVALID_PARAMETER | Invalid function parameter |
int i18n_uchar_iter_move | ( | i18n_uchar_iter_h | iter, |
int32_t | delta, | ||
i18n_uchar_iter_origin_e | origin, | ||
int32_t * | new_index | ||
) |
Moves the current position relative to the start or limit of the iteration range, or relative to the current position itself. The movement is expressed in numbers of code units forward or backward by specifying a positive or negative delta. Out of bounds movement will be pinned to the start or limit.
This function may perform slowly for moving relative to I18N_UCHAR_ITER_LENGTH because an iterator implementation may have to count the rest of the UChars if the native storage is not UTF-16. When moving relative to the limit or length, or relative to the current position after i18n_uchar_iter_set_state() was called, i18n_uchar_iter_move() may return I18N_UCHAR_ITER_UNKNOWN_INDEX to avoid an inefficient determination of the actual UTF-16 index. The actual index can be determined with i18n_uchar_iter_get_index(I18N_UCHAR_ITER_CURRENT) which will count the UChars if necessary.
[in] | iter | The i18n_uchar_iter_h object |
[in] | delta | Movement |
[in] | origin | Origin defining action to perform |
[out] | new_index | The new index or I18N_UCHAR_ITER_UNKNOWN_INDEX when the index is not known, or I18N_SENTINEL on an error condition |
0
on success, otherwise a negative error value I18N_ERROR_NONE | Successful |
I18N_ERROR_INVALID_PARAMETER | Invalid function parameter |
int i18n_uchar_iter_next | ( | i18n_uchar_iter_h | iter, |
i18n_uchar32 * | current | ||
) |
Returns the code unit at the current index and increments the index (post-increment, like s[i++]), or returns I18N_SENTINEL if there is none (index is at the limit).
[in] | iter | The i18n_uchar_iter_h object |
[out] | current | The current code unit |
0
on success, otherwise a negative error value I18N_ERROR_NONE | Successful |
I18N_ERROR_INVALID_PARAMETER | Invalid function parameter |
int i18n_uchar_iter_previous | ( | i18n_uchar_iter_h | iter, |
i18n_uchar32 * | previous | ||
) |
Decrements the index and returns the code unit from there (pre-decrement, like s[--i]), or returns I18N_SENTINEL if there is none (index is at the start).
[in] | iter | The i18n_uchar_iter_h object |
[out] | previous | The previous code unit |
0
on success, otherwise a negative error value I18N_ERROR_NONE | Successful |
I18N_ERROR_INVALID_PARAMETER | Invalid function parameter |
int i18n_uchar_iter_set_state | ( | i18n_uchar_iter_h | iter, |
uint32_t | state | ||
) |
Restores the "state" of the iterator using a state word from a i18n_uchar_iter_get_state() call. The iterator object need not be the same one as for which i18n_uchar_iter_get_state() was called, but it must be of the same type (set up using the same i18n_uchar_iter_set_* function) and it must iterate over the same string (binary identical regardless of memory address).
After calling i18n_uchar_iter_set_state(), an i18n_uchar_iter_get_index(I18N_UCHAR_ITER_CURRENT) may be slow because the UTF-16 index may not be restored as well, but the iterator can deliver the correct text contents and move relative to the current position without performance degradation.
[in] | iter | The i18n_uchar_iter_h object |
[in] | state | The state word from an i18n_uchar_iter_get_state() call on a same-type, same-string iterator |
0
on success, otherwise a negative error value I18N_ERROR_NONE | Successful |
I18N_ERROR_INVALID_PARAMETER | Invalid function parameter |
int i18n_uchar_iter_set_string | ( | i18n_uchar_iter_h | iter, |
const i18n_uchar * | s, | ||
int32_t | length | ||
) |
Sets up an i18n_uchar_iter_h to iterate over a string.
Sets the i18n_uchar_iter_h function pointers for iteration over the string s with iteration boundaries (start == index == 0) and (length == limit == string length). The "provider" may set the start, index, and limit values at any time within the range 0..length.
[in] | iter | The i18n_uchar_iter_h structure to be set for iteration |
[in] | s | String to iterate over |
[in] | length | Length of s, or -1 if NULL-terminated |
0
on success, otherwise a negative error value I18N_ERROR_NONE | Successful |
I18N_ERROR_INVALID_PARAMETER | Invalid function parameter |
int i18n_uchar_iter_set_utf16be | ( | i18n_uchar_iter_h | iter, |
const char * | s, | ||
int32_t | length | ||
) |
Sets up an i18n_uchar_iter_h to iterate over a UTF-16BE string (byte vector with a big-endian pair of bytes per i18n_uchar).
Everything works just like with a normal i18n_uchar iterator, except that i18n_uchar characters are assembled from byte pairs, and that the length argument here indicates an even number of bytes.
[in] | iter | i18n_uchar_iter_h structure to be set for iteration |
[in] | s | UTF-16BE string to iterate over |
[in] | length | Length of s as an even number of bytes, or -1 if NULL-terminated (NULL means pair of 0 bytes at even index from s) |
0
on success, otherwise a negative error value I18N_ERROR_NONE | Successful |
I18N_ERROR_INVALID_PARAMETER | Invalid function parameter |
int i18n_uchar_iter_set_utf8 | ( | i18n_uchar_iter_h | iter, |
const char * | s, | ||
int32_t | length | ||
) |
Sets up an i18n_uchar_iter_h to iterate over a UTF-8 string.
Sets the i18n_uchar_iter_h function pointers for iteration over the UTF-8 string s with UTF-8 iteration boundaries 0 and length. The implementation counts the UTF-16 index on the fly and lazily evaluates the UTF-16 length of the text.
[in] | iter | i18n_uchar_iter_h structure to be set for iteration |
[in] | s | UTF-8 string to iterate over |
[in] | length | Length of s, or -1 if NULL-terminated |
0
on success, otherwise a negative error value I18N_ERROR_NONE | Successful |
I18N_ERROR_INVALID_PARAMETER | Invalid function parameter |