Tizen Native API
Functions | Typedefs
Ucollator
i18n

Functions

int i18n_ucollator_create (const char *locale, i18n_ucollator_h *collator)
 Creates a i18n_ucollator_h for comparing strings.
int i18n_ucollator_destroy (i18n_ucollator_h collator)
 Closes a i18n_ucollator_h.
int i18n_ucollator_str_collator (const i18n_ucollator_h collator, const i18n_uchar *src, int32_t src_len, const i18n_uchar *target, int32_t target_len, i18n_ucollator_result_e *result)
 Compares two stirngs.
int i18n_ucollator_equal (const i18n_ucollator_h collator, const i18n_uchar *src, int32_t src_len, const i18n_uchar *target, int32_t target_len, i18n_ubool *equal)
 Compares two strings for equality.
int i18n_ucollator_set_strength (i18n_ucollator_h collator, i18n_ucollator_strength_e strength)
 Sets the collation strength used in a collator.
int i18n_ucollator_set_attribute (i18n_ucollator_h collator, i18n_ucollator_attribute_e attr, i18n_ucollator_attribute_value_e val)
 Sets a universal attribute setter.

Typedefs

typedef
i18n_ucollator_attribute_value_e 
i18n_ucollator_strength_e
 Enumeration in which the base letter represents a primary difference. Set comparison level to I18N_UCOLLATOR_PRIMARY to ignore secondary and tertiary differences. Use this to set the strength of an i18n_ucollator_h. Example of primary difference, "abc" < "abd" Diacritical differences on the same base letter represent a secondary difference. Set comparison level to I18N_UCOLLATOR_SECONDARY to ignore tertiary differences. Use this to set the strength of an i18n_ucollator_h. Example of secondary difference, "&auml;" >> "a". Uppercase and lowercase versions of the same character represent a tertiary difference. Set comparison level to I18N_UCOLLATOR_TERTIARY to include all comparison differences. Use this to set the strength of an i18n_ucollator_h. Example of tertiary difference, "abc" <<< "ABC". Two characters are considered "identical" when they have the same unicode spellings. I18N_UCOLLATOR_IDENTICAL. For example, "&auml;" == "&auml;". i18n_ucollator_strength_e is also used to determine the strength of sort keys generated from Ui18n_ucollator_hs. These values can now be found in the i18n_ucollator_attribute_value_e enum.

The Ucollator module performs locale-sensitive string comparison.

Required Header

#include <utils_i18n.h>

Overview

The Ucollator module performs locale-sensitive string comparison. It builds searching and sorting routines for natural language text and provides correct sorting orders for most locales supported.

Sample Code 1

Converts two different byte strings to two different unicode strings and compares the unicode strings to check if the strings are equal to each other.

    i18n_uchar uchar_src[64] = {0,};
    i18n_uchar uchar_target[64] = {0,};
    char *src = "tizen";
    char *target = "bada";
    int uchar_src_len = 0;
    int uchar_target_len = 0;
    i18n_ucollator_h coll = NULL;
    i18n_ubool result = NULL;

    i18n_ustring_from_UTF8( uchar_src, 64, NULL, src, -1 );
    i18n_ustring_from_UTF8( uchar_target, 64, NULL, target, -1 );

    // creates a collator
    i18n_ucollator_create( "en_US", &coll );

    // sets strength for coll
    i18n_ucollator_set_strength( coll, I18N_UCOLLATOR_PRIMARY );

    // compares uchar_src with uchar_target
    i18n_ustring_get_length( uchar_src, &uchar_src_len );
    i18n_ustring_get_length( uchar_target, &uchar_target_len );
    i18n_ucollator_equal( coll, uchar_src, uchar_src_len, uchar_target, uchar_target_len, &result );
    dlog_print(DLOG_INFO, LOG_TAG, "%s %s %s\n", src, result == 1 ? "is equal to" : "is not equal to", target );    // tizen is not equal to bada

    // destroys the collator
    i18n_ucollator_destroy( coll );  

Sample Code 2

Sorts in ascending order on the given data using string_ucollator

    i18n_ucollator_h coll = NULL;
    char *src[3] = { "cat", "banana", "airplane" };
    char *tmp = NULL;
    i18n_uchar buf_01[16] = {0,};
    i18n_uchar buf_02[16] = {0,};
    i18n_ucollator_result_e result = I18N_UCOLLATOR_EQUAL;
    int i = 0, j = 0;
    int ret = I18N_ERROR_NONE;
    int buf_01_len = 0, buf_02_len = 0;

    for ( i = 0; i < sizeof( src ) / sizeof( src[0] ); i++ ) {
        dlog_print(DLOG_INFO, LOG_TAG, "%s\n", src[i] );
    }    // cat    banana    airplane

    // creates a collator
    ret = i18n_ucollator_create( "en_US", &coll );

    // compares and sorts in ascending order
    if ( ret == I18N_ERROR_NONE ) {
        i18n_ucollator_set_strength( coll, I18N_UCOLLATOR_TERTIARY );
        for ( i = 0; i < 2; i++ ) {
            for ( j = 0; j < 2 - i; j++ ) {
                i18n_ustring_copy_ua( buf_01, src[j] );
                i18n_ustring_copy_ua( buf_02, src[j+1] );
                i18n_ustring_get_length( buf_01, &buf_01_len );
                i18n_ustring_get_length( buf_02, &buf_02_len );
                // compares buf_01 with buf_02
                i18n_ucollator_str_collator( coll, buf_01, buf_01_len, buf_02, buf_02_len, &result );
                if ( result == I18N_UCOLLATOR_GREATER ) {
                    tmp = src[j];
                    src[j] = src[j+1];
                    src[j+1] = tmp;
                }
            }
        }
    }
    // destroys the collator
    i18n_ucollator_destroy( coll );    // deallocate memory for collator

    for ( i = 0; i < sizeof( src ) / sizeof( src[0] ); i++ ) {
        dlog_print(DLOG_INFO, LOG_TAG, "%s\n", src[i] );
    }    // ariplane    banana    cat

Enumeration Type Documentation

Enumeration for attributes that collation service understands. All the attributes can take I18N_UCOLLATOR_DEFAULT value, as well as the values specific to each one.

Enumerator:
I18N_UCOLLATOR_FRENCH_COLLATION 

Attribute for direction of secondary weights - used in Canadian French. Acceptable values are I18N_UCOLLATOR_ON, which results in secondary weights being considered backwards, and I18N_UCOLLATOR_OFF which treats secondary weights in the order they appear

I18N_UCOLLATOR_ALTERNATE_HANDLING 

Attribute for handling variable elements. Acceptable values are I18N_UCOLLATOR_NON_IGNORABLE (default) which treats all the codepoints with non-ignorable primary weights in the same way, and I18N_UCOLLATOR_SHIFTED which causes codepoints with primary weights that are equal or below the variable top value to be ignored at the primary level and moved to the quaternary level

I18N_UCOLLATOR_CASE_FIRST 

Controls the ordering of upper and lower case letters. Acceptable values are I18N_UCOLLATOR_OFF (default), which orders upper and lower case letters in accordance to their tertiary weights, I18N_UCOLLATOR_UPPER_FIRST which forces upper case letters to sort before lower case letters, and I18N_UCOLLATOR_LOWER_FIRST which does the opposite

I18N_UCOLLATOR_CASE_LEVEL 

Controls whether an extra case level (positioned before the third level) is generated or not. Acceptable values are I18N_UCOLLATOR_OFF (default), when case level is not generated, and I18N_UCOLLATOR_ON which causes the case level to be generated. Contents of the case level are affected by the value of the I18N_UCOLLATOR_CASE_FIRST attribute. A simple way to ignore accent differences in a string is to set the strength to I18N_UCOLLATOR_PRIMARY and enable case level

I18N_UCOLLATOR_NORMALIZATION_MODE 

Controls whether the normalization check and necessary normalizations are performed. When set to I18N_UCOLLATOR_OFF (default) no normalization check is performed. The correctness of the result is guaranteed only if the input data is in so-called FCD form (see users manual for more info). When set to I18N_UCOLLATOR_ON, an incremental check is performed to see whether the input data is in the FCD form. If the data is not in the FCD form, incremental NFD normalization is performed

I18N_UCOLLATOR_DECOMPOSITION_MODE 

An alias for the I18N_UCOLLATOR_NORMALIZATION_MODE attribute

I18N_UCOLLATOR_STRENGTH 

The strength attribute. Can be either I18N_UCOLLATOR_PRIMARY, I18N_UCOLLATOR_SECONDARY, I18N_UCOLLATOR_TERTIARY, I18N_UCOLLATOR_QUATERNARY, or I18N_UCOLLATOR_IDENTICAL. The usual strength for most locales (except Japanese) is tertiary. Quaternary strength is useful when combined with shifted setting for the alternate handling attribute and for JIS X 4061 collation, when it is used to distinguish between Katakana and Hiragana. Otherwise, quaternary level is affected only by the number of non-ignorable code points in the string. Identical strength is rarely useful, as it amounts to codepoints of the NFD form of the string

I18N_UCOLLATOR_NUMERIC_COLLATION 

When turned on, this attribute makes substrings of digits that are sort according to their numeric values. This is a way to get '100' to sort AFTER '2'. Note that the longest digit substring that can be treated as a single unit is 254 digits (not counting leading zeros). If a digit substring is longer than that, the digits beyond the limit will be treated as a separate digit substring. A "digit" in this sense is a code point with General_Category=Nd, which does not include circled numbers, roman numerals, and so on. Only a contiguous digit substring is considered, that is, non-negative integers without separators. There is no support for plus/minus signs, decimals, exponents, and so on

I18N_UCOLLATOR_ATTRIBUTE_COUNT 

The number of UColAttribute constants

Enumeration containing attribute values for controling collation behavior. Here are all the allowable values. Not every attribute can take every value. The only universal value is I18N_UCOLLATOR_DEFAULT, which resets the attribute value to the predefined value for that locale.

Enumerator:
I18N_UCOLLATOR_DEFAULT 

Accepted by most attributes

I18N_UCOLLATOR_PRIMARY 

Primary collation strength

I18N_UCOLLATOR_SECONDARY 

Secondary collation strength

I18N_UCOLLATOR_TERTIARY 

Tertiary collation strength

I18N_UCOLLATOR_DEFAULT_STRENGTH 

Default collation strength

I18N_UCOLLATOR_QUATERNARY 

Quaternary collation strength

I18N_UCOLLATOR_IDENTICAL 

Identical collation strength

I18N_UCOLLATOR_OFF 

Turn the feature off - works for I18N_UCOLLATOR_FRENCH_COLLATION, I18N_UCOLLATOR_CASE_LEVEL & I18N_UCOLLATOR_DECOMPOSITION_MODE

I18N_UCOLLATOR_ON 

Turn the feature on - works for I18N_UCOLLATOR_FRENCH_COLLATION, I18N_UCOLLATOR_CASE_LEVEL & I18N_UCOLLATOR_DECOMPOSITION_MODE

I18N_UCOLLATOR_SHIFTED 

Valid for I18N_UCOLLATOR_ALTERNATE_HANDLING. Alternate handling will be shifted.

I18N_UCOLLATOR_NON_IGNORABLE 

Valid for I18N_UCOLLATOR_ALTERNATE_HANDLING. Alternate handling will be non ignorable.

I18N_UCOLLATOR_LOWER_FIRST 

Valid for I18N_UCOLLATOR_CASE_FIRST - lower case sorts before upper case.

I18N_UCOLLATOR_UPPER_FIRST 

Upper case sorts before lower case.

Enumeration for source and target string comparison result. I18N_UCOLLATOR_EQUAL is returned if the source string is compared to be less than the target string in the i18n_ucollator_str_collator() method. i18n_ucollator_equal() is returned if the source string is compared to be equal to the target string in the i18n_ucollator_str_collator() method. I18N_UCOLLATOR_GREATER is returned if the source string is compared to be greater than the target string in the i18n_ucollator_str_collator() method.

Enumerator:
I18N_UCOLLATOR_EQUAL 

string a == string b

I18N_UCOLLATOR_GREATER 

string a > string b

I18N_UCOLLATOR_LESS 

string a < string b


Function Documentation

int i18n_ucollator_create ( const char *  locale,
i18n_ucollator_h *  collator 
)

Creates a i18n_ucollator_h for comparing strings.

The i18n_ucollator_h is used in all the calls to the Collation service.
After finished, collator must be disposed off by calling i18n_ucollator_destroy().

Since :
2.3
Remarks:
Must release collator using i18n_ucollator_destroy().
Parameters:
[in]localeThe locale containing the required collation rules
Special values for locales can be passed in - if NULL is passed for the locale, the default locale collation rules will be used
If empty string ("") or "root" is passed, UCA rules will be used.
[out]collatori18n_ucollator_h, otherwise 0 if an error occurs
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
See also:
i18n_ucollator_destroy()
int i18n_ucollator_destroy ( i18n_ucollator_h  collator)

Closes a i18n_ucollator_h.

Once closed, a string_ucollator_h should not be used. Every an open collator should be closed. Otherwise, a memory leak will result.

Since :
2.3
Parameters:
[in]collatorThe i18n_ucollator_h to close
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
See also:
i18n_ucollator_create()
int i18n_ucollator_equal ( const i18n_ucollator_h  collator,
const i18n_uchar *  src,
int32_t  src_len,
const i18n_uchar *  target,
int32_t  target_len,
i18n_ubool *  equal 
)

Compares two strings for equality.

This function is equivalent to i18n_ucollator_str_collator().

Since :
2.3
Parameters:
[in]collatorThe i18n_ucollator_h containing the comparison rules
[in]srcThe source string
[in]src_lenThe length of the source, otherwise -1 if null-terminated
[in]targetThe target string
[in]target_lenThe length of the target, otherwise -1 if null-terminated
[out]equalIf true source is equal to target, otherwise false
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
See also:
i18n_ucollator_str_collator()

Sets a universal attribute setter.

Since :
2.3
Parameters:
[in]collatorThe i18n_collator_h containing attributes to be changed
[in]attrThe attribute type
[in]valThe attribute value
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
int i18n_ucollator_set_strength ( i18n_ucollator_h  collator,
i18n_ucollator_strength_e  strength 
)

Sets the collation strength used in a collator.

The strength influences how strings are compared.

Since :
2.3
Parameters:
[in]collatorThe i18n_collator_h to set.
[in]strengthThe desired collation strength.
One of i18n_ucollator_strength_e
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
int i18n_ucollator_str_collator ( const i18n_ucollator_h  collator,
const i18n_uchar *  src,
int32_t  src_len,
const i18n_uchar *  target,
int32_t  target_len,
i18n_ucollator_result_e result 
)

Compares two stirngs.

The strings will be compared using the options already specified.

Since :
2.3
Parameters:
[in]collatorThe i18n_ucollator_h containing the comparison rules
[in]srcThe source string
[in]src_lenThe length of the source, otherwise -1 if null-terminated
[in]targetThe target string.
[in]target_lenThe length of the target, otherwise -1 if null-terminated
[out]resultThe result of comparing the strings
One of I18N_UCOLLATOR_EQUAL, I18N_UCOLLATOR_GREATER, or I18N_UCOLLATOR_LESS
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
See also:
i18n_ucollator_equal()

Except as noted, this content - excluding the Code Examples - is licensed under Creative Commons Attribution 3.0 and all of the Code Examples contained herein are licensed under BSD-3-Clause.
For details, see the Content License