Categories
Software Localization

Skip the typical software localization beginner’s traps

Localized software opens new markets and creates more revenue. This concept sounds great; however, what does it mean for you, as a software developer? How can you write software that you can easily localize? And what does localization mean for your day-to-day work?

You might consider the following: What is different in other cultures? The differences include many areas:

Address, character sets, code pages, currency, date, language, list separators, measurements, numbers, paper sizes, phone numbers, sort order, taxes, time

Languages

Your first consideration is most likely the language. At the very least, you must think about how to translate all strings in the application user interface. Usually, these are strings for menu entries, dialog boxes, message boxes, the status bar, and error messages.

If you want to send all strings to a translator, you must consistently separate strings from source code. Start this process now. Don’t wait until you are under pressure to meet a deadline. All major development languages for Windows support resource files. You have no excuse! You can easily read strings from resources, such as in classic Visual Basic with LoadResString or LoadStr in a VCL application.

Solution: Store all strings in Windows standard resource files; or, if you develop with Microsoft .Net, use ResX.

Character Sets

Using a different language often means you must consider another character set. Especially if English is your first language, you might think that you need only 128 characters. However, many languages use special characters:

  • French accents, like in à, é, î, and ç,
  • Spanish punctuation, like the reverse question mark ¿,
  • Umlauts, like ß, ä, ö and ü in Germany or Finland,
  • Other umlauts, like æ and Ã¥ used in Denmark, Norway or Sweden.

The list is endless. So how would you feel if you couldn’t use characters from your own native alphabet? What if your name is Henry, and you couldn’t write your name, because a Russian software developer would not support the letter H as his/her language does not use that letter? Would you write enry instead? Or would you directly uninstall the application?

Solution: You should use Unicode string handling in your application, whenever possible. This allows you to support all languages and character sets. The Unicode Windows API can help you accomplish this. If your development environment does support ANSI character sets, you should ensure that you do not restrict your input to the first 128 characters.

Code Pages

Unfortunately, few development systems support Unicode in their visual components. This is the case in Visual C++ and .NET languages like C#, Visual Basic .NET. Many others, such as VCL Delphi and classic Visual Basic development systems, do not have built-in Unicode support for the GUI. In these cases, your program will start with the code page the user set for non-Unicode applications in the system. You cannot influence this setting from within your application. If you exchange data with your translator or user, make sure that you know how to handle code pages (read more).

Solution: Use a localization tool that handles all languages and streamlines the data exchange process. Search for a localization tool that supports double-byte and left-to-right code, so that you don’t have problems later.

Numbers

A number is a number is a number, you might suppose. Wrong. Numbers are formatted. Two things differ between countries: the decimal and the thousand separator. Some countries, like the USA, use a comma to separate thousands and a point as a decimal separator. Therefore, one thousand and two cents are 1,000.02. Other languages, such as German, use a comma and a point in the opposite way, so 1.000,02 displays the same as the previous example. In Switzerland, it is 1’000,02 because the Swiss use the quote sign as the thousand separator.

Solution: Do not store numbers internally, in databases, or in files as formatted strings. Always use a numeric variable type like Long or Float. When you display numbers, format them with the right system setting for the thousand and decimal separators. The Windows API provides functions to get the appropriate values. In .Net, check the culture name space. When you allow user input, make sure that the user knows which format is required.

Currencies

The currency used in a country affects your application, too. Most currencies have their own currency symbol. Examples are € for Euro in Europe, £ for the British pound, or Italian lira (outdated), ¥ for the Japanese yen or Chinese Yuan, and $ for the dollar used in Australia, Canada, Jamaica, New Zealand, USA, and many others. The currency symbol is defined in the character set used in the country. The symbol is also defined in the regional settings of the Windows control panel.

Because the symbol does not fully specify the currency as shown in the previous examples, you should use the international three-character currency codes derived from ISO 4217, like USD for US dollar, EUR for Euro and so on. If your application handles more than one currency, you should save the currency code, too. You should be careful when you define a currency field, and exchange data with a spreadsheet or database application, like Excel or Access. These applications use the system setting. The monetary difference is quite large in the conversion, such as going from the Japanese Yen (JPY) to the US $ (USD) without currency conversion.

In addition, you should be aware that the currency code might be placed in front of, or behind, the currency value.

Solution: Be sure to check the system settings for the default currency and symbol placement. The Windows API provides functions to get the appropriate values. In .Net, check the culture name space. Be prepared and use the international currency codes. When you allow user input, make sure that the user knows which format is required.

Dates

You should never hard-code date values. The date order is different between countries. In the short date format, the USA uses mm/dd/yyyy where m is the month, d is the day, and y is the year. Germany uses dd.mm.yyyy. If you do not take care of this, for example, in Visual Basic, a date string like 12/9/2006 can be interpreted as 9th December or 12th September. If you use medium or long date formats, the day and month names must also be translated. If you use format routines, you should ensure that your development system supports date format in the way that you require. If you need to calculate with dates, store them in a format that is system independent like the ISO 8601 format yyyy-mm-ddThh:mm:ss; you can also convert the dates to a system-independent date number format, such as date serial. This makes dates sorting easy.

Solution: Be sure to store dates internally and in files without using a format. Use the data type for your programming language. If you allow user input, collect the day, month and year in separate fields, and internally build a date data type from these fields. When you display dates, format them with the right system settings. The Windows API provides functions to get the appropriate values. In .Net, check the culture name space. When you allow user input, make sure that the user knows which format is required.

List Separators

Who needs list separators? You do, trust me. You should consider list separators whenever you handle a string array in multi-column list boxes, memory, or comma separated values files (.csv). .Csv files are only comma separated for languages that use a comma as list separator. However, many languages do use a comma to separate decimals in numbers, that’s why they use a semi-colon (;) to separate string arrays.

Solution: Get the list separator setting of the user system. The Windows API provides functions to get the appropriate values. In .Net, check the culture name space.

Measurements

You should never hard-code local measurements, like inches and miles. Whenever possible, you should use the metric standards, such as centimeter and kilometer. You can take the same approach for weight: instead of pounds, use kilograms. In addition, the liter is more popular than pint or gallons. The metric system differs in other countries. Moreover, ISO favors the metric system now. Even kilobyte is no longer 1024 bytes in ISO; kilobyte is 1000 bytes, because average people are accustomed to counting metrics. I am personally humbled and embarrassed that I missed that for seven years. On a more serious note, in some European Union (EU) countries, placing ads with non-metric measurements is against the law.

Solution: Don’t hard-code measurements and be prepared to select and convert them.

Paper Formats

If you print many documents, you might have wondered how many odd formats your printer driver can handle. Not surprisingly, paper sheets come in many more sizes than just the standard letter and A4 paper.

Solution: If you must format the printout, check the paper format. You can get this information from the Windows API or directly from the printer object or class in your development language. Do not expect only one of these sizes because user-defined types might be used. For example, professional output devices might have sizes like A4+ for borderless A4 output.

Phone Numbers

Usually, an international phone number has three parts after the leading plus sign: country calling code, area code, and local phone number. A country calling code consists of one to three digits; for example, 1 for USA and Canada, 32 for Belgium, 420 for Czech Republic, and 86 for China. However, many countries, such as Denmark, do not have an area code. The number of area code digits also differs. Sometimes it is a defined number, like three in the USA; however, in Germany, the area code can have three to five digits after the leading zero. German callers do not use the leading zero in international calls from Germany. This contrasts with Italy, where you must dial the leading zero in international calls. The digit number for the local number also differs. In Germany, local numbers can contain three to seven digits, sometimes even eight for numbers to a pbx. In some countries like the USA, phone numbers can also contain an extension at the end, separated by a hash #, which is used only by the pbx of the phone holder. The only consistent aspect of a telephone format is that an international phone number can’t be more than 15 digits.

Solution: To be safe, internally save international phone numbers. Don’t accept input that is only in your local phone format. You should always accept international numbers. For example, don’t limit the area code to three digits or require seven digits for local numbers.

Sort Order

As described in the previous information about character sets, different countries or languages have different lists of characters, or, in other words, different alphabets. Thus, the languages can have additional characters like umlauts or accents, such as in German, French, Danish, Swedish, Norwegian, Finnish, Turkish, and so on. Some languages don’t support some of our favorite characters, like h in Russian, x in Greek, and many more. All these languages sort their characters differently. If you do alphabetical sorts in your application, you should at least think about supporting the sort order in the localized language. Amazingly, even large or popular applications do not support sort order in their localized applications, at least in the past. Supporting different sort orders depends on the importance of your users alphabetizing their data. For example, if you have an application that handles addresses, contact information, or other large amounts of data, your user will definitely miss this feature.

Solution: In .Net, you can check the culture name space for the sort order. In other development systems, you must check your string sorting routines; and, for your own implementation, you may have to collect the data on the web first.

States

If you design an online contact form, you should never force your user to enter a state, or, even worse, select one of the 50 US states. Not all users live in the USA and are used to providing the state they live in. Some users might live in countries that use other systems, like departments in France or counties in Great Britain.

Tax System

If you produce an accounting application, keep in mind that tax systems are different in many countries. In the EU, for example, a gross tax is named a value-added tax, but no local sales tax.

Time

If you use time, you must consider the twelve and twenty-four hour models over the world. Twelve-hour systems, as used in the United Kingdom or USA, use am and pm to define whether the time is before or after lunch. You must ensure that time zones are reflected in your application. Which time coding do you need? Local times, like Eastern Standard Time (EST) in New York, Mountain Standard Time (MST) in Colorado, or Pacific Standard Time (PST) in California, USA? Greenwich Mean Time (GMT) is international time and is the basis for the world time clock. GMT is the preferred time if you exchange data in other countries. This time system is based on the local time in the English city Greenwich (GMT+0). All time differences are given in GMT+x or GMT-x. For example, France, Germany, and the Netherlands show GMT+1, PST is GMT-8, EST is GMT-5, and Japan is GMT+9. Differences may also occur in summer and winter. Most countries have summer time savings, although Japan does not. There is another time system, Zulu or UTC. If you need to code time, such as in e-mail or Internet formats, you can check the related RFCs to store time.

Solution: Make sure that you store time internally and always use the same time zone in files. When you display a time format , use the correct system settings. The Windows API provides functions to get the appropriate values. In .Net, check the culture name space. Use GMT time coding, instead of local formats, such as MST and PST. These time formats are not common outside the USA, and a US user probably has no clue as to what CET (Central European Time) means.

Other traps

Depending on your application, you should check legal issues, cultural differences, standards, and more. These issues may result in logical changes to your source code. For example, accounting rules differ between the USA and Germany and popular payment methods can vary.

Cultural differences can bring you serious trouble or kick your product completely out of the market. I’m sure that you don’t want to offend your users. You should never use certain colors, body parts, or animals. Different cultures see these things differently. There is a rumor about Borland from the very early days. In Germany, Borland used a pig in the Turbo Pascal ads. These ads were very popular: a pig brings luck and can represent speed in Germany. The rumor is that the Borland founder and former CEO, Philippe Kahn, stopped the German ads. As a native Frenchman, he did not want a pig associated with his products.

Conclusion

Don’t be shocked about the number of differences in our many global cultures. Most of the differences are easy to address in your application code. Other software companies have tackled these problems, so you don’t need to reinvent the wheel. Many fine tools are available. In particular, the Windows API provides all of the tools you need. Moreover, if you are already using .NET, you are a top priority with Microsoft, because Microsoft knows they must appeal to the global programmer and provide you with the right tools—at the beginning of your software development process, when it matters most.

When you write flexible code, localization is a snap. However, if you have hard-coded some of the items mentioned in this article, you have much work to do. Take the chance to implement this article’s recommendations in the beginning of your project. Plan a bit more in advance. This will save you time and money.

Most developers who have hard-coded strings are afraid of converting their application to string resources. Yes, if you have a large application with many strings, this process can cost you a few extra days. However, you will save more time and money when you start the localization process. All your hard work will pay off, as you can process every language and every update easily and efficiently. Remember to ensure that your resource files are Unicode-aware, so that you are prepared for the future.

— Renate Reinartz

6 replies on “Skip the typical software localization beginner’s traps”

A very nice round up of all the localization concern.

One point that is often overlooked is that layout can look different in different languages. For example in Chinese, the phase are often short. So if the layout is done based on the narrow design it could have problem when translated to language with long word/phase like German.

User interface design for localization will be a different article. Different text lenght and text orientation is definitely an issue to take care of when designing an international software application.

Be careful when coding into the future with visual studio 5, as the directory libraries will encounter a bug, on files dated beyond year 2038-01-18T22:14 🙂

Just to be a nitpick, but the (outdated) symbol for the Italian lira is L. (capital L followed by a dot)

For the rest two thumbs up!

😉

I couldn’t understand some parts of this article s traps | The Localization Tool, but I guess I just need to check some more resources regarding this, because it sounds interesting.

Leave a Reply