When using gettext or a similar framework, should the translation strings include ':' (like "Label:") for generating labels for UI or should I add ':' in UI code itself?
7 Answers
The colon can be regarded as a punctuation symbol. It is a convention which is part of the text, just like a period or question mark.
We wouldn't leave out the question mark from "Are you sure you wish to exit?" and then write code to add the question attributes in a language-dependent manner to the translated string. Doing so means that the UI code is unnecessarily taking on the responsibilities of knowing how to punctuate sentences in various languages: a responsibility which can be handled in the message substitution.
There is an intermediate possibility. It is likely that, just like all labels have the colon feature in English, in other languages labels also have some lexical element in common. That element, if present, is probably some kind of prefix or suffix, or both.
You could represent the labels without the adornment, and then have an additional translatable string which provides a schema for adding the label adornment. As an example, suppose that C sprintf style formatting is available in the UI application. The English version of the label-generating format string would be "%s:", and that could be the default as well, since it works in some other languages. The UI translators can replace this "%s:" with whatever they see fit. One of the possibilities is that they can replace it with just "%s" (do nothing) and then specify the full, adorned representation of each label in the translated string table. So this approach even handles some strange possibilities where the lexical marker which denotes a label has to be inserted into the middle.
This approach doesn't seem worthwhile, if all it achieves is a slight compression in the representation of label strings: the removal of one colon character. If you have to write 100 characters of extra code for this, you have to remove colons from 100 labels just to break even: and that doesn't even take into consideration justifying the time spent.
There has to be some payoff for this: namely that the application uses the strings for purposes other than just generating labels, such as generating sentences which refer to the UI fields by name. Say that a dialog box has a "User ID:" label for entering text. If you have generic logic which produces the message "You have entered an invalid user ID." by combining a sentence boilerplate text with the name of a UI element, then it's necessary to have the unadorned string "user ID", and pass it through a label-making function to generate "User ID:".
- 3,692
For many languages, there is no one-to-one translation from English word and phrases, but multiple translations that are context sensitive.
To make the life of translators easier, you should provide as much context for the strings as possible. That includes colons in labels and contextual information where those labels are being used.
As ground rules, in an internationalized UI you should
- not modify translated strings, except to fill in parameters with their actual values. So, don't add the colons after the fact.
- not cut strings into parts around parameters. Especially if there are multiple parameters to be filled in, you can be sure that there will be at least one language where it would be more natural to have the parameters the other way around.
- be really careful with singular/plural forms. There is no common pattern how to create plurals from singulars, or even how many plural forms there are.
- 78,673
I've finally decided to use entire strings (strings with colons in this case) in my i18n files.
The reason for this that in French there should be a space before colon. So the best way to encode French is to put colons (with spaces before) in translation strings.
No, we do not translate to French. But this is an example for a general rule of behavior: Put colons in translation strings, not in UI code.
- 791
An application language file is not just a dummy translation of words. It is a process where you translate the words and their punctuational "presentation" in the correct meaningfull context.
Hello? in Spanish is ¿Hola? in Arabic is مرحبا؟. As you can see you can't just store a Hello or Hola or مرحبا and then in the UI just do a the_hello_text + "?". It will not produce the correct output. It is obvious that punctuation need to be taken care at the language file. That means it is not the GUI's concern to "add" a questionmark or a colon at the end of a string.
Punctuation and everything must be inside the internationalization file, ready to be outputted to the UI.
The only thing UI should be concerned about, is the correct presentation of this ready-to-be-otputed text, like align right if is an RTL language. But that's another story and has nothing to do with plain-text internationalization language files per se.
- 201
The optimal approach is to embed the characters in the string for each locale, as that typically ensures that the context is correct, assuming you have done your research as to what your target audience expects. It is also simpler to manage.
For program-built strings, such as error messages, it may be better to put the symbols in a separate internationalised string, as different languages use different symbols for the same grammatical scenario. For example, Armenian uses the colon as its full stop, so one would have a 'sentence terminator' string. Another is a 'word separator' string, which would be blank for some languages.
Each country typically provides a style guide, which dictates all the 'correct' places for punctuation. So, when I started to work out how I would handle quotes for different languages for some web design tools, I first looked at such style guides.
However, while written publications tended to follow style guides, the online world is quite different! Typically, a large number of non-English European and South American sites use US style quotes, as opposed to the guillemets (« ») of their style guides. Just shows how much the US domination of the early web permeates online language usage around the world.
The MIT Foreign Language News and Newspapers: Home has links to hundreds of online sites. Looking at these helped me find the best approach for my dilemma, which was to provide the facility for the site owner to select one of the 19 most popular sets of quotes-embedded quotes combinations appropriate for their target audience.
Chrome tries to automatically use a country's style guide, but fortunately it can be overridden by specifying a locale in the lang attribute of the q tag. This highlights the problem with automatic approaches that don't take into account the real world, but rely upon theory for their implementations.
To the OP, research those online newspaper sites to see what are actually used in various countries, so that you can see which approach will give the more consistent results.
While some languages have traditionally used a different character for the English colon, online usage may target audiences used to that colon. Also, different locales may have different usage, requiring specifying language strings by each full locale, rather than just by language.
When I was doing my own internationalisation a few years ago, it worked something like this:
Messages in the source code were written in English
Messages were translated at runtime by applying a translation (which appeared in the source as translate("string") or more usually /"string"). There was expected to be a pre-built dictionary of messages and translations.
When translating a message, white space at each end was trimmed, and trailing punctuation was removed, as was any capitalisation. After translating what was left, these were put back.
To provide more context, I sometimes added a comment to the string, which was part of the translation process to help find the best match, but the comment was then discarded.
So, with a message such as " Disk: ", the string "disk" was translated into, say, "disquette", and then recomposed as " Disquette: ". This reduced the number of very similar messages.
I only did this for a small number of western European languages; probably there would be problems with more exotic ones. However I was using a scripting-type language for this so some string processing could be used for whatever problems came up: when I needed to translate "G" (short for Green), it appeared in the source as left(/"Green"), translated to something like "vert" and reduced to "V".
However I'm not familiar with current frameworks and how they might work; don't they provide any guidelines for dealing with these types of issues?
It may depend on the localization system that you use, but having other things equal, I would personally avoid adding any punctuation (unless within a phrase of course, where its use is dictated by the grammar), because I feel they're part of the presentation, like font size etc., and not really the content. So we're mixing different things with this approach.
After all, the same words and phrases may be needed both with punctuation and without. Eg. you can have "Enter subject:" caption next to a textbox, but also "Enter subject" as a window title.
Does it make sense to have both of them translated separately?
When you decide that colons actually look bad and redundant in the UI, you'll have to retranslate all language versions. Which is a bit silly.
PS. The "ground rules" given by @bartc are valid either way - whether or not you include punctuation marks in translated strings.
PPS. @paulkayuk, too, raises a good point (in his comment) - that culture specifics should be taken into consideration as well. If you've got things like mirrored question marks in Spanish, include them in your translation of course. My answer assumes uniform, language-agnostic punctuation, because that seems to be the debatable bit.
- 9,768