Tizen Native API
Ucollator

The Ucollator module performs locale-sensitive string comparison.

Required Header

#include <utils_i18n.h>

Overview

The Ucollator module performs locale-sensitive string comparison. It builds searching and sorting routines for natural language text and provides correct sorting orders for most locales supported.

Sample Code 1

Converts two different byte strings to two different unicode strings and compares the unicode strings to check if the strings are equal to each other.

    i18n_uchar uchar_src[64] = {0,};
    i18n_uchar uchar_target[64] = {0,};
    char *src = "tizen";
    char *target = "bada";
    int uchar_src_len = 0;
    int uchar_target_len = 0;
    i18n_ucollator_h coll = NULL;
    i18n_ubool result = NULL;

    i18n_ustring_from_UTF8( uchar_src, 64, NULL, src, -1 );
    i18n_ustring_from_UTF8( uchar_target, 64, NULL, target, -1 );

    // creates a collator
    i18n_ucollator_create( "en_US", &coll );

    // sets strength for coll
    i18n_ucollator_set_strength( coll, I18N_UCOLLATOR_PRIMARY );

    // compares uchar_src with uchar_target
    i18n_ustring_get_length( uchar_src, &uchar_src_len );
    i18n_ustring_get_length( uchar_target, &uchar_target_len );
    i18n_ucollator_equal( coll, uchar_src, uchar_src_len, uchar_target, uchar_target_len, &result );
    dlog_print(DLOG_INFO, LOG_TAG, "%s %s %s\n", src, result == 1 ? "is equal to" : "is not equal to", target );    // tizen is not equal to bada

    // destroys the collator
    i18n_ucollator_destroy( coll );

Sample Code 2

Sorts in ascending order on the given data using string_ucollator

    i18n_ucollator_h coll = NULL;
    char *src[3] = { "cat", "banana", "airplane" };
    char *tmp = NULL;
    i18n_uchar buf_01[16] = {0,};
    i18n_uchar buf_02[16] = {0,};
    i18n_ucollator_result_e result = I18N_UCOLLATOR_EQUAL;
    int i = 0, j = 0;
    int ret = I18N_ERROR_NONE;
    int buf_01_len = 0, buf_02_len = 0;

    for ( i = 0; i < sizeof( src ) / sizeof( src[0] ); i++ ) {
        dlog_print(DLOG_INFO, LOG_TAG, "%s\n", src[i] );
    }    // cat    banana    airplane

    // creates a collator
    ret = i18n_ucollator_create( "en_US", &coll );

    // compares and sorts in ascending order
    if ( ret == I18N_ERROR_NONE ) {
        i18n_ucollator_set_strength( coll, I18N_UCOLLATOR_TERTIARY );
        for ( i = 0; i < 2; i++ ) {
            for ( j = 0; j < 2 - i; j++ ) {
                i18n_ustring_copy_ua( buf_01, src[j] );
                i18n_ustring_copy_ua( buf_02, src[j+1] );
                i18n_ustring_get_length( buf_01, &buf_01_len );
                i18n_ustring_get_length( buf_02, &buf_02_len );
                // compares buf_01 with buf_02
                i18n_ucollator_str_collator( coll, buf_01, buf_01_len, buf_02, buf_02_len, &result );
                if ( result == I18N_UCOLLATOR_GREATER ) {
                    tmp = src[j];
                    src[j] = src[j+1];
                    src[j+1] = tmp;
                }
            }
        }
    }
    // destroys the collator
    i18n_ucollator_destroy( coll );    // deallocate memory for collator

    for ( i = 0; i < sizeof( src ) / sizeof( src[0] ); i++ ) {
        dlog_print(DLOG_INFO, LOG_TAG, "%s\n", src[i] );
    }    // ariplane    banana    cat

Functions

int i18n_ucollator_create (const char *locale, i18n_ucollator_h *collator)
 Creates a i18n_ucollator_h for comparing strings.
int i18n_ucollator_destroy (i18n_ucollator_h collator)
 Closes a i18n_ucollator_h.
int i18n_ucollator_str_collator (const i18n_ucollator_h collator, const i18n_uchar *src, int32_t src_len, const i18n_uchar *target, int32_t target_len, i18n_ucollator_result_e *result)
 Compares two strings.
int i18n_ucollator_equal (const i18n_ucollator_h collator, const i18n_uchar *src, int32_t src_len, const i18n_uchar *target, int32_t target_len, i18n_ubool *equal)
 Compares two strings for equality.
int i18n_ucollator_set_strength (i18n_ucollator_h collator, i18n_ucollator_strength_e strength)
 Sets the collation strength used in a collator.
int i18n_ucollator_set_attribute (i18n_ucollator_h collator, i18n_ucollator_attribute_e attr, i18n_ucollator_attribute_value_e val)
 Sets a universal attribute setter.

Typedefs

typedef void * i18n_ucollator_h
 Structure representing a collator object instance.
typedef
i18n_ucollator_attribute_value_e 
i18n_ucollator_strength_e
 Enumeration in which the base letter represents a primary difference. Set comparison level to I18N_UCOLLATOR_PRIMARY to ignore secondary and tertiary differences. Use this to set the strength of an i18n_ucollator_h. Example of primary difference, "abc" < "abd" Diacritical differences on the same base letter represent a secondary difference. Set comparison level to I18N_UCOLLATOR_SECONDARY to ignore tertiary differences. Use this to set the strength of an i18n_ucollator_h. Example of secondary difference, "&auml;" >> "a". Uppercase and lowercase versions of the same character represent a tertiary difference. Set comparison level to I18N_UCOLLATOR_TERTIARY to include all comparison differences. Use this to set the strength of an i18n_ucollator_h. Example of tertiary difference, "abc" <<< "ABC". Two characters are considered "identical" when they have the same unicode spellings. I18N_UCOLLATOR_IDENTICAL. For example, "&auml;" == "&auml;". i18n_ucollator_strength_e is also used to determine the strength of sort keys generated from i18n_ucollator_h. These values can now be found in the i18n_ucollator_attribute_value_e enum.

Typedef Documentation

typedef void* i18n_ucollator_h

Structure representing a collator object instance.

Since :
2.3.1

Enumeration in which the base letter represents a primary difference. Set comparison level to I18N_UCOLLATOR_PRIMARY to ignore secondary and tertiary differences. Use this to set the strength of an i18n_ucollator_h. Example of primary difference, "abc" < "abd" Diacritical differences on the same base letter represent a secondary difference. Set comparison level to I18N_UCOLLATOR_SECONDARY to ignore tertiary differences. Use this to set the strength of an i18n_ucollator_h. Example of secondary difference, "&auml;" >> "a". Uppercase and lowercase versions of the same character represent a tertiary difference. Set comparison level to I18N_UCOLLATOR_TERTIARY to include all comparison differences. Use this to set the strength of an i18n_ucollator_h. Example of tertiary difference, "abc" <<< "ABC". Two characters are considered "identical" when they have the same unicode spellings. I18N_UCOLLATOR_IDENTICAL. For example, "&auml;" == "&auml;". i18n_ucollator_strength_e is also used to determine the strength of sort keys generated from i18n_ucollator_h. These values can now be found in the i18n_ucollator_attribute_value_e enum.

Since :
2.3.1

Enumeration Type Documentation

Enumeration for attributes that collation service understands. All the attributes can take I18N_UCOLLATOR_DEFAULT value, as well as the values specific to each one.

Since :
2.3.1
Enumerator:
I18N_UCOLLATOR_FRENCH_COLLATION 

Attribute for direction of secondary weights - used in Canadian French. Acceptable values are I18N_UCOLLATOR_ON, which results in secondary weights being considered backwards, and I18N_UCOLLATOR_OFF which treats secondary weights in the order they appear

I18N_UCOLLATOR_ALTERNATE_HANDLING 

Attribute for handling variable elements. Acceptable values are I18N_UCOLLATOR_NON_IGNORABLE (default) which treats all the codepoints with non-ignorable primary weights in the same way, and I18N_UCOLLATOR_SHIFTED which causes codepoints with primary weights that are equal or below the variable top value to be ignored at the primary level and moved to the quaternary level

I18N_UCOLLATOR_CASE_FIRST 

Controls the ordering of upper and lower case letters. Acceptable values are I18N_UCOLLATOR_OFF (default), which orders upper and lower case letters in accordance to their tertiary weights, I18N_UCOLLATOR_UPPER_FIRST which forces upper case letters to sort before lower case letters, and I18N_UCOLLATOR_LOWER_FIRST which does the opposite

I18N_UCOLLATOR_CASE_LEVEL 

Controls whether an extra case level (positioned before the third level) is generated or not. Acceptable values are I18N_UCOLLATOR_OFF (default), when case level is not generated, and I18N_UCOLLATOR_ON which causes the case level to be generated. Contents of the case level are affected by the value of the I18N_UCOLLATOR_CASE_FIRST attribute. A simple way to ignore accent differences in a string is to set the strength to I18N_UCOLLATOR_PRIMARY and enable case level

I18N_UCOLLATOR_NORMALIZATION_MODE 

Controls whether the normalization check and necessary normalizations are performed. When set to I18N_UCOLLATOR_OFF (default) no normalization check is performed. The correctness of the result is guaranteed only if the input data is in so-called FCD form (see users manual for more info). When set to I18N_UCOLLATOR_ON, an incremental check is performed to see whether the input data is in the FCD form. If the data is not in the FCD form, incremental NFD normalization is performed

I18N_UCOLLATOR_DECOMPOSITION_MODE 

An alias for the I18N_UCOLLATOR_NORMALIZATION_MODE attribute

I18N_UCOLLATOR_STRENGTH 

The strength attribute. Can be either I18N_UCOLLATOR_PRIMARY, I18N_UCOLLATOR_SECONDARY, I18N_UCOLLATOR_TERTIARY, I18N_UCOLLATOR_QUATERNARY, or I18N_UCOLLATOR_IDENTICAL. The usual strength for most locales (except Japanese) is tertiary. Quaternary strength is useful when combined with shifted setting for the alternate handling attribute and for JIS X 4061 collation, when it is used to distinguish between Katakana and Hiragana. Otherwise, quaternary level is affected only by the number of non-ignorable code points in the string. Identical strength is rarely useful, as it amounts to codepoints of the NFD form of the string

I18N_UCOLLATOR_NUMERIC_COLLATION 

When turned on, this attribute makes substrings of digits that are sort according to their numeric values. This is a way to get '100' to sort AFTER '2'. Note that the longest digit substring that can be treated as a single unit is 254 digits (not counting leading zeros). If a digit substring is longer than that, the digits beyond the limit will be treated as a separate digit substring. A "digit" in this sense is a code point with General_Category=Nd, which does not include circled numbers, roman numerals, and so on. Only a contiguous digit substring is considered, that is, non-negative integers without separators. There is no support for plus/minus signs, decimals, exponents, and so on

I18N_UCOLLATOR_ATTRIBUTE_COUNT 

The number of i18n_ucollator_attribute_e constants

Enumeration containing attribute values for controlling collation behavior. Here are all the allowable values. Not every attribute can take every value. The only universal value is I18N_UCOLLATOR_DEFAULT, which resets the attribute value to the predefined value for that locale.

Since :
2.3.1
Enumerator:
I18N_UCOLLATOR_DEFAULT 

Accepted by most attributes

I18N_UCOLLATOR_PRIMARY 

Primary collation strength

I18N_UCOLLATOR_SECONDARY 

Secondary collation strength

I18N_UCOLLATOR_TERTIARY 

Tertiary collation strength

I18N_UCOLLATOR_DEFAULT_STRENGTH 

Default collation strength

I18N_UCOLLATOR_QUATERNARY 

Quaternary collation strength

I18N_UCOLLATOR_IDENTICAL 

Identical collation strength

I18N_UCOLLATOR_OFF 

Turn the feature off - works for I18N_UCOLLATOR_FRENCH_COLLATION, I18N_UCOLLATOR_CASE_LEVEL & I18N_UCOLLATOR_DECOMPOSITION_MODE

I18N_UCOLLATOR_ON 

Turn the feature on - works for I18N_UCOLLATOR_FRENCH_COLLATION, I18N_UCOLLATOR_CASE_LEVEL & I18N_UCOLLATOR_DECOMPOSITION_MODE

I18N_UCOLLATOR_SHIFTED 

Valid for I18N_UCOLLATOR_ALTERNATE_HANDLING. Alternate handling will be shifted.

I18N_UCOLLATOR_NON_IGNORABLE 

Valid for I18N_UCOLLATOR_ALTERNATE_HANDLING. Alternate handling will be non ignorable.

I18N_UCOLLATOR_LOWER_FIRST 

Valid for I18N_UCOLLATOR_CASE_FIRST - lower case sorts before upper case.

I18N_UCOLLATOR_UPPER_FIRST 

Upper case sorts before lower case.

Enumeration for source and target string comparison result. I18N_UCOLLATOR_LESS is returned if the source string is compared to be less than the target string in the i18n_ucollator_str_collator() method. I18N_UCOLLATOR_EQUAL is returned if the source string is compared to be equal to the target string in the i18n_ucollator_str_collator() method. I18N_UCOLLATOR_GREATER is returned if the source string is compared to be greater than the target string in the i18n_ucollator_str_collator() method.

Since :
2.3.1
Enumerator:
I18N_UCOLLATOR_EQUAL 

string a == string b

I18N_UCOLLATOR_GREATER 

string a > string b

I18N_UCOLLATOR_LESS 

string a < string b


Function Documentation

int i18n_ucollator_create ( const char *  locale,
i18n_ucollator_h collator 
)

Creates a i18n_ucollator_h for comparing strings.

The i18n_ucollator_h is used in all the calls to the Collation service.
After finished, collator must be disposed off by calling i18n_ucollator_destroy().

Since :
2.3.1
Remarks:
Must release collator using i18n_ucollator_destroy().
Parameters:
[in]localeThe locale containing the required collation rules
Special values for locales can be passed in - if NULL is passed for the locale, the default locale collation rules will be used
If empty string ("") or "root" is passed, UCA rules will be used.
[out]collatori18n_ucollator_h, otherwise 0 if an error occurs
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
See also:
i18n_ucollator_destroy()

Closes a i18n_ucollator_h.

Once closed, a string_ucollator_h should not be used. Every an open collator should be closed. Otherwise, a memory leak will result.

Since :
2.3.1
Parameters:
[in]collatorThe i18n_ucollator_h to close
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
See also:
i18n_ucollator_create()
int i18n_ucollator_equal ( const i18n_ucollator_h  collator,
const i18n_uchar src,
int32_t  src_len,
const i18n_uchar target,
int32_t  target_len,
i18n_ubool equal 
)

Compares two strings for equality.

This function is equivalent to i18n_ucollator_str_collator().

Since :
2.3.1
Parameters:
[in]collatorThe i18n_ucollator_h containing the comparison rules
[in]srcThe source string
[in]src_lenThe length of the source, otherwise -1 if null-terminated
[in]targetThe target string
[in]target_lenThe length of the target, otherwise -1 if null-terminated
[out]equalIf true source is equal to target, otherwise false
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
See also:
i18n_ucollator_str_collator()

Sets a universal attribute setter.

Since :
2.3.1
Parameters:
[in]collatorThe i18n_collator_h containing attributes to be changed
[in]attrThe attribute type
[in]valThe attribute value
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter

Sets the collation strength used in a collator.

The strength influences how strings are compared.

Since :
2.3.1
Parameters:
[in]collatorThe i18n_collator_h to set.
[in]strengthThe desired collation strength.
One of i18n_ucollator_strength_e
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
int i18n_ucollator_str_collator ( const i18n_ucollator_h  collator,
const i18n_uchar src,
int32_t  src_len,
const i18n_uchar target,
int32_t  target_len,
i18n_ucollator_result_e result 
)

Compares two strings.

The strings will be compared using the options already specified.

Since :
2.3.1
Parameters:
[in]collatorThe i18n_ucollator_h containing the comparison rules
[in]srcThe source string
[in]src_lenThe length of the source, otherwise -1 if null-terminated
[in]targetThe target string.
[in]target_lenThe length of the target, otherwise -1 if null-terminated
[out]resultThe result of comparing the strings
One of I18N_UCOLLATOR_EQUAL, I18N_UCOLLATOR_GREATER, or I18N_UCOLLATOR_LESS
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
See also:
i18n_ucollator_equal()