The Unormalization module provides Unicode normalization functionality for standard unicode normalization.

Required Header

#include <utils_i18n.h>

Overview

The Unormalization module provides Unicode normalization functionality for standard unicode normalization. All instances of i18n_unormalizer_h are unmodifiable/immutable. Instances returned by i18n_unormalization_get_instance() are singletons that must not be deleted by the caller.

Sample Code 1

Creates a normalizer and normalizes a unicode string

    i18n_unormalizer_h normalizer = NULL;
    i18n_uchar src = 0xAC00;
    i18n_uchar dest[4] = {0,};
    int dest_str_len = 0;
    int i = 0;

    // gets instance for normalizer
    i18n_unormalization_get_instance( NULL, "nfc", I18N_UNORMALIZATION_DECOMPOSE, &normalizer );

    // normalizes a unicode string
    i18n_unormalization_normalize( normalizer, &src, 1, dest, 4, &dest_str_len );
    dlog_print(DLOG_INFO, LOG_TAG, "src is 0x%x\n", src );    // src is 0xAC00 (0xAC00: A Korean character combined with consonant and vowel)

    for ( i = 0; i < dest_str_len; i++ ) {
        dlog_print(DLOG_INFO, LOG_TAG, "dest[%d] is 0x%x\t", i + 1, dest[i] );    // dest[1] is 0x1100  dest[2] is 0x1161 (0x1100: consonant, 0x1161: vowel)
    }

Functions
int	i18n_unormalization_get_instance (const char package_name, const char name, i18n_unormalization_mode_e mode, i18n_unormalizer_h *normalizer)
	Gets a i18n_unormalizer_h which uses the specified data file and composes or decomposes text according to the specified mode.
int	i18n_unormalization_normalize (i18n_unormalizer_h normalizer, const i18n_uchar src, int32_t len, i18n_uchar dest, int32_t capacity, int32_t *len_deststr)
	Writes the normalized form of the source string to the destination string(replacing its contents).
Typedefs
typedef const void *	i18n_unormalizer_h
	i18n_unormalizer_h.

Typedef Documentation

typedef const void* i18n_unormalizer_h

i18n_unormalizer_h.

Since :: 2.3

Enumeration Type Documentation

enum i18n_unormalization_check_result_e

Result values for normalization quick check functions.

Since :: 2.4

Enumerator:

I18N_UNORMALIZATION_NO	The input string is not in the normalization form.
I18N_UNORMALIZATION_YES	The input string is in the normalization form.
I18N_UNORMALIZATION_MAYBE	The input string may or may not be in the normalization form.

enum i18n_unormalization_mode_e

Enumeration of constants for normalization modes. For details about standard Unicode normalization forms and about the algorithms which are also used with custom mapping tables see http://www.unicode.org/unicode/reports/tr15/.

Since :: 2.3

Enumerator:

I18N_UNORMALIZATION_COMPOSE	Decomposition followed by composition. Same as standard NFC when using an "nfc" instance. Same as standard NFKC when using an "nfkc" instance. For details about standard Unicode normalization forms see http://www.unicode.org/unicode/reports/tr15/
I18N_UNORMALIZATION_DECOMPOSE	Map and reorder canonically. Same as standard NFD when using an "nfc" instance. Same as standard NFKD when using an "nfkc" instance. For details about standard Unicode normalization forms see http://www.unicode.org/unicode/reports/tr15/
I18N_UNORMALIZATION_FCD	"Fast C or D" form. If a string is in this form, then further decomposition without reordering would yield the same form as DECOMPOSE. Text in "Fast C or D" form can be processed efficiently with data tables that are "canonically closed", that is, that provide equivalent data for equivalent text, without having to be fully normalized. Not a standard Unicode normalization form. Not a unique form: Different FCD strings can be canonically equivalent. For details see http://www.unicode.org/notes/tn5/#FCD
I18N_UNORMALIZATION_COMPOSE_CONTIGUOUS	Compose only contiguously. Also known as "FCC" or "Fast C Contiguous". The result will often but not always be in NFC. The result will conform to FCD which is useful for processing. Not a standard Unicode normalization form. For details see http://www.unicode.org/notes/tn5/#FCC

Function Documentation

int i18n_unormalization_get_instance	(	const char *	package_name,
		const char *	name,
		i18n_unormalization_mode_e	mode,
		i18n_unormalizer_h *	normalizer
	)

Gets a i18n_unormalizer_h which uses the specified data file and composes or decomposes text according to the specified mode.

Since :: 2.3

Parameters:

[in]	package_name	`NULL` for ICU built-in data, otherwise application data package name.
[in]	name	"nfc" or "nfkc" or "nfkc_cf" or the name of the custom data file.
[in]	mode	The normalization mode (compose or decompose).
[out]	normalizer	The requested normalizer on success.

Return values:

I18N_ERROR_NONE	Successful
I18N_ERROR_INVALID_PARAMETER	Invalid function parameter

int i18n_unormalization_normalize	(	i18n_unormalizer_h	normalizer,
		const i18n_uchar *	src,
		int32_t	len,
		i18n_uchar *	dest,
		int32_t	capacity,
		int32_t *	len_deststr
	)

Writes the normalized form of the source string to the destination string(replacing its contents).

The source and destination strings must be different buffers.

Since :: 2.3