Localization Glossary

ANSI: Abbreviation for American National Standard Organization; also used as the short form for the 8-bit ANSI code. A character set of 256 characters. The first 128 are the same for all countries; the higher 128 differ.

Culture name space: In Microsoft .NET, a whole name space is related to cultural differences. It contains nearby all information you need to prevent hard-coding localization traps.

DBCS: Abbreviation for Double Byte Character Set. In contrast to ANSI character sets not only one, but two bytes are used to code a character. This leads to 65,536 possible characters.

Double-byte: A character set that is defined with two bytes. Thus, you can define two power 16 = 65,536 characters, instead of the poor single byte and 256 characters stored in ANSI. UCS-2 is a double-byte character set.

I18N: Abbreviation for Internationalisation (British English) and Internationalization (American English). Because these two spellings for one thing differ just in one character, the idea was to use an expression that works for both. I stands for the first charcter, 18 for the number of character between the first and last character, and N for the last one.

ISO: Abbreviation for International Standard Organization. A huge number of standards defined by this organization is named with ISO followed by a number. Some examples for localization related iso standards are ISO 639, ISO 3166, and ISO 10646.

International Standard Organization: This organization defines internationally accepted standards for a huge range of fields, which influence our daily life.

L10N: Abbreviation for Localisation (British English) and Localization (American English). Because these two spellings for one thing differ just in one character, the idea was to use an expression that works for both. L stands for the first charcter, 10 for the number of character between the first and last character, and N for the last one.

Left-to-right: Refers to the writing direction. Speakers of many languages begin to write at the upper left of a page and write horizontally to the right side.

Resource files: In Windows, refers to data stored in various routines, such as .exe or .dll files, typically used in applications. In .Net, tesource files can store culture-specific data in one place, separate from the code. The data can include strings, dialog boxes, menus, icons, manifest, version information, and more. Using resource files makes localizing applications easier, because you can separate the design and strings from the code. You can then create localized applications without touching the source code or recompiling the entire application.

ResX: Resource format used in .NET applications.

Right-to-Left: Refers to the writing direction. In Arabic countries, persons write starting at the upper right of a page and continuing horizontally to the left side.

Segmentation Rules: Segmentation Rules describe how to segment text items into text parts. Example: A text paragraph is a huge segment, which can be segmented into smaller parts, like the sentences. We can even further segment a sentence. The segmentation rule now describes how to find a segment, and the exceptions. Typical exceptions are e.g. abbreviations.
Text is segmented to store the segments into translation memory. Smaller strings are more often reused. So with a good set of segmentation rules we raise the profit of a translation memory, too.

SRX: Segmentation Rules eXchange (SRX) is a xml-based format describing a set of rules how to segment text items to store the text segments into translation memory.
With SRX translators are able to reuse their work product and project independant. SRX standard is defined by

TM: Abrreviation for Translation Memory.

Translation Memory: A database which stores combination of source text and one or more corresponding translations.
Using translation memory makes it easy to reuse your existing translation efforts, in example in updates, or other related products.

TMX: Translation Memory eXchange (TMX) is a xml-based format defined to help localizers and translators to exchange their existing work (translation memory) between localization tools. TMX standard is defined by

Unicode: Definition of all characters used on earth, actually defined are over one million.

Unicode Windows API: First Windows versions have been ANSI related. However, with globalization and the growing need for localized software applications, Microsoft established a Unicode-enabled API that accepts and returns all strings as Unicode strings.

UTF-8: 8-bit Unicode transformation format; defined through RFC-3629 and ISO-10646. Widely used to store Unicode characters in e-mails, on the web, and in XML. 8-bit Unicode is a variable length format, using between one and four bytes to code a character.

UTF-16: 16-bit Unicode transformation format.

UTF-32: 32-bit Unicode transformation format.

Windows API: The Windows Application Programming Interface that is a large set of DLLs, which include functions for every aspect of Windows. Or, in other words, nearly everything that can accomplished in Windows is available as an API function.

XLIFF: Abbreviation for XML Localization Interchange File Format.

XML Localization Interchange File Format: This is a xml-based format which contains translatable data and meta tags which can be used to visualize the data. This standard is defined by