38

Which caveats should I be aware while localizing numbers in my front-end application ?

Example: In Brazilian Portuguese (pt-BR) we split thousands with dots and decimals with commas. In US English (en-US) that's the contrary. In pt-BR we present the digits separated by the thousands, the same as en-US. But reading about Indian English (en-IN) today I came across this gem:

The Indian numbering system is preferred for digit grouping. When written in words, or when spoken, numbers less than 100,000/100 000 are expressed just as they are in Standard English. Numbers including and beyond 100,000 / 100 000 are expressed in a subset of the Indian numbering system.

https://en.wikipedia.org/wiki/Indian_English#Numbering_system

Which means:

1000000 units in pt-BR are formatted 1.000.000
1000000 units in en-US are formatted 1,000,000
1000000 units in en-IN are formatted 10,00,000

Besides commas and dots and other specific separators, it seems that masking is also a valid concern.

Which other caveats should I be aware while localizing numbers in my front-end application? Specially if I'm showing numbers to non-latin character sets?

Machado
  • 4,130

6 Answers6

86

Most programming languages and frameworks already have a sensible, working mechanism that you can use for this.

For example, the C# ecosystem has the System.Globalization namespace, which allows you to specify the Culture you want:

Console.WriteLine(myMoneyValue.ToString("C", "en-US"));

This is not something that you want to re-invent. Use the internationalization features provided by your favorite language or framework.

Robert Harvey
  • 200,592
24

Some excellent answers here already, but they did not mention one thing which I think is important not to forget: make sure wherever a number formatting takes place, it is clear (or can be controlled) what the output is used for:

  • when it is for the user interface, the localized formatting must be applied

  • when the number is going to be written to a file, or sent over the network, or another form of where the number is needed in machine readable form, make sure it is not formatted according to the current culture, but according to a fixed setting (for example, in the .NET environment, use InvariantCulture).

Otherwise you get problems when numbers are written or sent using culture A, and read or received using culture B.

To my experience, this is one of the biggest hurdles in doing proper localization of numbers: in an attempt to centralize the number formatting and conversion, people start to create general, reusable functions for the formatting, and then start to use them all over the place. However, as soon as one needs the numbers also in a machine readable string format somewhere else in the program, two variants are needed: a localized and a non-localized formatting. This introduces a high risk of mixing up the two forms of conversions (especially when the developers and testing machines have their default locale settings similar to the "fixed" setting used for non-UI formatting, but part of the user base has not).

Addendum: this problem can become really nasty in situations where it is not clear beforehand if the number will be processed by a machine, or by a human (or both) later. For example, as part of the output of a log file. In such cases it is probably best to stick to the "neutral" standard of using no separator except the point as a decimal separator.

jscs
  • 848
  • 9
  • 17
Doc Brown
  • 218,378
8

Proper localization is quite difficult. Most programming ecosystems have attempts at a solutions for localization, but in my experience they are all more or less broken. I would therefore suggest:

  • Don't try to automate localization. It won't always work. It is difficult for you to spot the problems, and frustrating for your users.

  • Be consistent: don't mix different languages and formatting conventions, e.g. Brasilian-style decimal separators in English text.

  • Explicitly support a given set of locales. Work together with your translators to figure out proper formatting for dates and numbers. You will likely end up creating your own localization toolkit, though most (but not all) problems can be delegated to an existing library.

  • Make simple formatting choices configurable by each user: formats for dates and times, decimal separators, preferred currency, …. This is especially useful for travellers, expats, or other people that need to mix multiple locales or cultures independently of language.

amon
  • 135,795
2

You can't be aware of all the caveats of languages. You are talking about numbers, but there are plurals, genders, collation. You need to know they exist and rely on extensive work performed by other people, most notably the ICU and CLDR projects.

Most modern languages implement some or all features of these projects, but even if they don't, reading about these projects will give you a good idea of what to look for.

http://site.icu-project.org

http://cldr.unicode.org

Update

The CLDR survey tool provides access to all patterns. That will show you how to format a number in certain language and region. For example, Portuguese (Portugal):

http://st.unicode.org/cldr-apps/v#/pt_PT/Number_Formatting_Patterns/

And if you really want to check all data (and perhaps use it), you can download the CLDR in JSON format from GitHub:

https://github.com/unicode-cldr/cldr-json#cldr-json

More info about downloads here:

http://cldr.unicode.org/index/downloads

noderman
  • 129
2

An important consideration: You should decide how much is enough. Because if you go down the rabbit hole of trying to localize perfectly, it will become increasingly complex.

Take a typical label like "You have selected n items." This reads wrong if there is only one item selected. The ugly but pragmatic solution is to write "You have selected n item(s)." But if you want to do it correctly, you need two different texts depending on n. If you try to do this in multiple locales it will quickly get really complex, since different languages have different grammar. Some languages have different conjugations for one, two and multiple items and so on. For this reasons people in the know will always complain that existing localization frameworks are insufficient.

But you have to choose your battles, and decide what level of sophistication is sufficient. For many purposes a standard localization library for formatting numbers and dates should be sufficient.

JacquesB
  • 61,955
  • 21
  • 135
  • 189
0

Well, while I'm happy with all the answers here, I'm not really satisfied with each of them separately to mark one as the correct answer.

So far this is what we should be aware of when localizing numbers:

For humans:

  • Thousands separators are not always separating at thousands. See Indian case in the question;
  • Thousands and decimals characters varies culture to culture. In German thousands are split using spaces, for example, while in English it's commans and in Portuguese it's dots;
  • We don't have information if there's a relevant difference between left-to-right and right-to-left languages;
  • Provide a specific set of supported localizations and make it clear for your users;
  • Allow your users to change the default localization to one of the supported localization and they'll be happy and send you cakes being grateful, because you're a generous god. :) ;

For computers:

  • Remember that machines are not lenient and should always receive the same formatting while serializing and de-serializing a number;
  • Stick with a single format for it;
  • Use the minimum necessary format possible. Avoid thousands separation, decimals should be enough for serialization and de-serialization.

For developers:

  • (as suggested by @hyde below): Use existing library for localization;
  • If you can, use native testers and specify localization/internationalization test cases, otherwise trust the library;
  • Remember that localization is a problem mostly solved. Every major language has a library, native or external, that can localize numbers, dates and times;
Machado
  • 4,130