The Allegro Wiki is migrating to github at

Allegro unicode

From Allegro Wiki
Jump to: navigation, search
This article is incomplete and needs some revision or has some pending TODOs. Please help Allegro by editing and finishing it. When the article becomes complete, you may remove this tag.

Allegro can manipulate and display text using any character values from 0 right up to 2^32-1 (although the current implementation of the grabber can only create fonts using characters up to 2^16-1). You can choose between a number of different text encoding formats, which controls how strings are stored and how Allegro interprets strings that you pass to it. This setting affects all aspects of the system: whenever you see a function that returns a char * type, or that takes a char * as an argument, that text will be in whatever format you have told Allegro to use.

By default, Allegro uses UTF-8 encoded text (U_UTF8). This is a variable-width format, where characters can occupy anywhere from one to six bytes. The nice thing about it is that characters ranging from 0-127 are encoded directly as themselves, so UTF-8 is upwardly compatible with 7 bit ASCII ("Hello, World!" means the same thing regardless of whether you interpret it as ASCII or UTF-8 data). Any character values above 128, such as accented vowels, the UK currency symbol, and Arabic or Chinese characters, will be encoded as a sequence of two or more bytes, each in the range 128-255. This means you will never get what looks like a 7 bit ASCII character as part of the encoding of a different character value, which makes it very easy to manipulate UTF-8 strings.

There are a few editing programs that understand UTF-8 format text files. Alternatively, you can write your strings in plain ASCII or 16 bit Unicode formats, and then use the Allegro textconv program to convert them into UTF-8.

If you prefer to use some other text format, you can set Allegro to work with normal 8 bit ASCII (U_ASCII), or 16 bit Unicode (U_UNICODE) instead, or you can provide some handler functions to make it support whatever other text encoding you like (for example it would be easy to add support for 32 bit UCS-4 characters, or the Chinese GB-code format).

There is some limited support for alternative 8 bit codepages, via the U_ASCII_CP mode. This is very slow, so you shouldn't use it for serious work, but it can be handy as an easy way to convert text between different codepages. By default the U_ASCII_CP mode is set up to reduce text to a clean 7 bit ASCII format, trying to replace any accented vowels with their simpler equivalents (this is used by the allegro_message() function when it needs to print an error report onto a text mode DOS screen). If you want to work with other codepages, you can do this by passing a character mapping table to the set_ucodepage() function.

Note that you can use the Unicode routines before you call install_allegro() or allegro_init(). If you want to work in a text mode other than UTF-8, it is best to set it with set_uformat() just before you call these.


void set_uformat( int type )

  Sets the current text encoding format. This will affect all parts of 
  Allegro, wherever you see a function that returns a char *, or takes a 
  char * as a parameter. The type should be one of the values:
     U_ASCII     - fixed size, 8 bit ASCII characters
     U_ASCII_CP  - alternative 8 bit codepage (see set_ucodepage())
     U_UNICODE   - fixed size, 16 bit Unicode characters
     U_UTF8      - variable size, UTF-8 format Unicode characters
  Although you can change the text format on the fly, this is not a good 
  idea. Many strings, for example the names of your hardware drivers and 
  any language translations, are loaded when you call allegro_init(), so if 
  you change the encoding format after this, they will be in the wrong 
  format, and things will not work properly. Generally you should only call 
  set_uformat() once, before allegro_init(), and then leave it on the same 
  setting for the duration of your program.


int get_uformat( void )

  Returns the currently selected text encoding format.