SourceForge Logolibuninameslist
A Library of Unicode annotation data

Description

The Unicode consortium provides a file containing annotations on many unicode characters. This library contains a compiled version of this file so that programs can access these data easily.

The library contains a very large (sparse) array with one entry for each unicode code point (U+0000 - U+10FFFF). Each entry contains two strings, a name and an annotation. Either or both may be NULL. The library also contains a (much smaller) list of all the Unicode blocks.

struct unicode_block {
    int start, end;
    const char *name;
};

struct unicode_nameannot {
    const char *name, *annot;
};

extern const struct unicode_block UnicodeBlock[124];

#define UNICODE_NAME_MAX	94
#define UNICODE_ANNOT_MAX	372
extern const struct unicode_nameannot * const *const UnicodeNameAnnot[];

/* Index by: UnicodeNameAnnot[(uni>>16)&0x1f][(uni>>8)&0xff][uni&0xff] */

/* At the beginning of lines (after a tab) within the annotation string, a */
/*  * should be replaced by a bullet U+2022 */
/*  x should be replaced by a right arrow U+2192 */
/*  : should be replaced by an equivalent U+224D */
/*  # should be replaced by an approximate U+2245 */
/*  = should remain itself */

This package consists of one header file and one library file. The header is <uninameslist.h>. To find the name of a given unicode character uni use
    UnicodeNameAnnot[(uni>>16)&0x1f][(uni>>8)&0xff][uni&0xff].name
while the annotation string is:
    UnicodeNameAnnot[(uni>>16)&0x1f][(uni>>8)&0xff][uni&0xff].annot
The both strings are in US ASCII, but the annotation string is intended to be modified slightly by the having any '*' characters which immediately follow a tab at the start of a line converted to a bullet character. Etc.

Installation and Build instructions

Download the source from sourceforge. Then:

$ tar xf libuninameslist*.tgz
$ gunzip libuninameslist*.tgz ; tar xf libuninameslist*.tar
$ cd libuninameslist
$ configure
$ make
$ su
# make install

See Also