10

Say your native language is Hebrew, and you're working in a programming language like Python 3, which lets you put Hebrew in source code. Good for you! You've got a dict:

d = {'a': 1}

and you want to replace that a with some Hebrew. So you replace that single character:

d = {'א': 1}

Uh oh. Just by replacing one character, without making any other changes, your display went crazy. Everything from the Hebrew to the 1 is backward, and it's extremely non-obvious that this is even valid syntax (it is), let alone what it means.

Hebrew is intrinsically right-to-left, and even without any invisible control characters, Hebrew text will show up right-to left. This also applies to certain "regular" characters in positions near Hebrew, as well as characters from a few other scripts. The details are complicated.

How do you deal with this? You can't stick control characters into your source code to fix the display without breaking the code. Writing everything in hex escapes trades one kind of unreadability for another. Even if you resign yourself to naming everything with characters from the Basic Latin block and sticking all Hebrew strings in localization files, it's hard to avoid mixing right-to-left text with left-to-right.

JSON or CSV with Hebrew in it will be garbled. If those localization files you shoved your strings into were supposed to be human-readable, well, they're probably not. What do you do?

1 Answers1

2

AFAIK, this mostly is relevant when you use non-ASCII letter in identifiers (and perhaps comments) in your code.

If you discipline yourself to avoid that, e.g. if your code use "English" looking identifiers and keywords and comments, this is much less an issue (and every software developer should be able to read English documentation and code). Then, internationalization & localization of your application happens only in messages, notably literal strings.

You could then use some message catalog. For example in C and POSIX, you'll use gettext(3) and friends. The localized message catalog contains all the localized / internationalized variants of the message. If your application is only for Hebrew users (and that is not a big market) have Hebrew only in literal strings.

To be more specific, the hello world application would contain

void say_hello(char*towhom) {
  printf(gettext("hello %s"), towhom);
}

and your application would customize itself at start of run by calling some setlocale(3) with appropriate arguments.

See locale(7). Adapt all this to your Python and operating system. Many cross-platform frameworks (e.g. Qt) have extensive support for internationalization & localization.

Of course there is the delicate issue to display Unicode strings. Most serious display and GUI libraries and toolkits (Qt, GTk, ...) are able to deal with mixed languages strings (e.g. displaying something containing Hebrew and English and Russian and Chinese).

For a broader view, read the wikipage on internationalization and localization of software.

A JSON file is valid when containing only ASCII characters, with other characters (which would appear only in JSON strings) encoded with \u05d0 (instead of א) in the string.

Perhaps you could find a good enough editor and customize it for your needs. I'm sure that you could find some Emacs submode (or else customize one) to cover the particular issue of having Hebrew literal strings in Python (but still have English looking identifiers and comments).

BTW, I don't know how an Hebrew keyboard looks like, but in most keyboard layouts, you can configure them so that typing ASCII letters (i.e. Latin ones) is faster than typing non-ASCII ones. So even for yourself, it could be better to type English looking code.

Regarding JSON data, you should be able to configure your editor to see א when a string contains \u05d0 (otherwise use a JSON converter à la jq)

So I believe your real issue should be to choose and configure well enough a good editor (while having Hebrew only inside literal strings; in the rare case where a literal string needs to contain both Hebrew and English, split it into several pieces.). I guess that both Emacs and Vim could be configured to fit your needs.