39

I'm not sure what to do with the following:

We take data from an external tool within our own tool. This data is written in Dutch. We are writing our Java code in English. Should we then translate this Dutch to English or keep it Dutch? For example, we have 2 departments: Bouw (Construction in English) & Onderhoud (Maintenance in English).

Would it then be logical to create:

public enum Department { BOUW, ONDERHOUD }

or:

public enum Department { CONSTRUCTION, MAINTENANCE }

or even:

public enum Afdeling { BOUW, ONDERHOUD }

(afdeling is Department in Dutch)

Martijn
  • 1,016
  • 9
  • 14
Jelle
  • 2,064

7 Answers7

61

English is a lingua franca/lowest common denominator for a reason. Even if the reason is conceptually as weak as "Everybody does it", that's still a rather important reason.

Going against common practice means that you have to understand Dutch to make sense of the data structures in yor software. There's nothing wrong with Dutch, but the probability that any given engineer who'll have to interact with the code base speaks it is still lower than that for English.

Therefore, unless you're a Dutch-only shop, and don't plan to expand internationally ever, it's almost always a good idea to keep your codebase monolingual, and use the most popular coding language.

Note: This advice applies to program code only. User data should definitely not be translated, but processed "as is". Even if you have a customer "Goldstein", clearly you should not store their name as "golden stone".

The trouble is that there is a continuum of terms between "user-supplied, don't touch" and "code fragment, use English at all times". Customer names are very near the former end of the spectrum, Java variables near the latter end. Constants for enum values are slightly farther away, particularly if they denote well-known, unique external entities (like your departments). If everyone in your organisation uses the Dutch terms for the departments, you don't plan on confronting anyone with the code base who doesn't, and the set of existing departments changes rarely, then using the accepted names of the department may make more sense for enum constants than for local variables. I still wouldn't do it, though.

Robert Harvey
  • 200,592
Kilian Foth
  • 110,899
34

In this scenario, I would leave the enum values in Dutch:

public enum Department { BOUW, ONDERHOUD }

Because the logic using these constants will be matching against data that is also in Dutch. For example, if the input is "bouw", the comparison code might look like:

if (Department.BOUW == input.toUpper())

I find it easier to debug when the values match (even if I don't know what the values mean). Translation just adds a mental hoop I, as a developer, should not have to jump through to prove correctness.

Nevertheless, you can just comment the code if it helps others understand the context of the data:

public enum Department { 
    BOUW, /* build */
    ONDERHOUD /* maintenance */
}
bishop
  • 730
14

Avoid translation where possible, because every translation is additional effort and may introduce bugs.

The key contribution of "Domain Driven Design" to modern software engineering is the concept of a Ubiquitous Language, which is a single language used by all stake holders of a project. According to DDD, translation should not occur within a team (which includes domain experts, even if present only by proxy of a specification document), but only between teams (further reading: "Domain Driven Design" by Eric Evans, in particular the chapters about Ubiquitous Language and strategic design).

That is, if your business experts (or your specification document) speak Dutch, use their (Dutch) terminology when expressing business concerns in source code. Do not needlessly translate into English, because doing so creates an artificial impediment for communication between business experts and programmers, which takes time and can (through ambiguous or bad translation) cause bugs.

If, in contrast, your business experts can talk about their business in both English and Dutch, you are in the fortunate situation of being able to pick the project's ubiquitous language, and there are valid reasons for preferring English (such as "internationally understandable and more likely to be used by standards"), but doing so does not mean that coders should translate what the business people are talking about. Instead, the business people should switch languages.

Having a ubiquitous language is particularly important if requirements are complex and must be implemented precisely, if you're just doing CRUD the language you use internally matters less.

Personal anecdote: I was in a project where we exposed some business services as a SOAP endpoint. The business was entirely specified in German, and unlikely to be reused as is in english, because it was about legal matters specific to a particular jurisdiction. Nevertheless, some ivory tower architects mandated that the SOAP interface be English to promote future reuse. This translation occurred at hoc, and with little coordination among developers, yet alone a shared glossary, resulting in the same business term having several names in the web service contract, and some business terms having the same name in the web service contract. Oh, and of course some names where used on either side of the divide - but with different meanings!

If you choose to translate anyway, please standardize the translation in a glossary, add compliance with that glossary to your definition of done, and check it in your reviews. Don't be as careless as we have been.

meriton
  • 4,338
9

The correct solution is to not hard-code the departments at all:

ArrayList<String> departments = (... load them from a configuration file ...)

Or, if you absolutely need a department type:

class Department { String name; Department(String name) { this.name = name; } ... }
HashMap<String, Department> = (... generate from configuration file ...)

If you find the need to test against specific departments in your code, you have to ask more generically what is special about that department, and accept configuring that department as having that property. For example, if one department has weekly payroll, and that's what the code cares about, there should be a WEEKLY_PAYROLL property that can be attached to any department by the configuration.

3

For any people wondering: we've chosen for the first option, mainly because we think that you should not make up terms for the sake of translating. However, if sometime, an international developer would be working on the project, we've added some documentation to explain so:

/** The possible departments of a project, given in the Dutch language. */
public enum Department { BOUW, ONDERHOUD }
Jelle
  • 2,064
2

If you are worried about having a string representation to show the user or something, just define a descriptions array inside your enum and expose a method.
Eg: Department.BUILD.getDescription(); will output "BOUW"

public enum Department { 
    BUILD,
    MAINTENANCE;

    private String[] descriptions = new String[] {
        "BOUW",
        "ONDERHOUD"
    };

    public String getDescription() {
        return descriptions[ordinal()];
    }
}

I know you chose otherwise, but just in case the google vortex throws people here by accident.

EDIT: As noted by Pokechu22 you can use enum constructors and private properties like this:

public enum Department {
    BUILD("BOUW"),
    MAINTENANCE("ONDERHOUD");

    private final String description;

    private Department(String description) {
        this.description = description;
    }

    public String getDescription() {
        return description;
    }
}

which will also achieve that effect.

SparK
  • 2,017
0

Certain invariants of your code are expected to hold. One of those invariants is that a program will not behave differently when an identifier is renamed. In this case in particular, when you have an enum, and you rename any member of that enum, and update all the uses of that member, you would not expect your code to start functioning differently.

Parsing is the process of reading data and deriving datastructures from it. When you take the external data, read it, and create instances of your enum, you are parsing the data. That parsing process is the only part of your program responsible for maintaining the relation between the data representation how you receive it, and the shape and naming of the members of your datatypes.

As such, it shouldn't matter what names you assign to the members of the enum. That they happen to coincide with strings used in the data you read is coincidental.

When you design your code to model the domain, names of members shouldn't be related to the serialization format of the data. They should neither be the Dutch terms, nor should they be translations of the Dutch terms, but they should be what you decide fits the domain model best.

The parser than translates between the data format and your domain model. That's the last of the influence the data format should have on your code.

Martijn
  • 1,016
  • 9
  • 14