243

As an experienced software developer, I have learned to avoid magic strings.

My problem is that it is such a long time since I have used them, I've forgotten most of the reasons why. As a result, I'm having trouble explaining why they're a problem to my less experienced colleagues.

What objective reasons are there for avoiding them? What problems do they cause?

Kramii
  • 14,199
  • 5
  • 46
  • 64

7 Answers7

273
  1. In a language that compiles, a magic string's value is not checked at compile time. If the string must match a particular pattern, you have to run the program to guarantee it fits that pattern. If you used something such as an enum, the value is at least valid at compile-time, even if it might be the wrong value.

  2. If a magic string is being written in multiple places you have to change all of them without any safety (such as compile-time error). This can be countered by only declaring it in one place and reusing the variable, though.

  3. Typos can become serious bugs. If you have a function:

    func(string foo) {
        if (foo == "bar") {
            // do something
        }
    }
    

    and someone accidentally types:

    func("barr");
    

    This is worse the rarer or more complex the string is, especially if you have programmers that are unfamiliar with the project's native language.

  4. Magic strings are rarely self-documenting. If you see one string, that tells you nothing of what else the string could / should be. You will probably have to look into the implementation to be sure you've picked the right string.

    That sort of implementation is leaky, needing either external documentation or access to the code to understand what should be written, especially since it has to be character-perfect (as in point 3).

  5. Short of "find string" functions in IDEs, there are a small number of tools that support the pattern.

  6. You may coincidentally use the same magic string in two places, when really they are different things, so if you did a Find & Replace, and changed both, one of them could break while the other worked.

103

The summit of what the other answers have grasped at, is not that "magic values" are bad, but that they ought to be:

  1. defined recognisably as constants;
  2. defined only once within their entire domain of use (if architecturally possible);
  3. defined together if they form a set of constants that are somehow related;
  4. defined at an appropriate level of generality in the application in which they are used; and
  5. defined in such a way as to limit their use in inappropriate contexts (e.g. amenable to type checking).

What typically distinguishes acceptable "constants" from "magic values" is some violation of one or more of these rules.

Used well, constants simply allow us to express certain axioms of our code.

Which brings me to a final point, that an excessive use of constants (and therefore an excessive number of assumptions or constraints expressed in terms of values), even if it otherwise complies with the criteria above (but especially if it deviates from them), may imply that the solution being devised is not sufficiently general or well-structured (and therefore we're not really talking about the pros and cons of constants anymore, but about the pros and cons of well-structured code).

High-level languages have constructs for patterns in lower-level languages which would have to employ constants. The same patterns can also be used in the higher-level language, but ought not to be.

But that may be an expert judgment based on an impression of all the circumstances and what a solution ought to look like, and exactly how that judgment will be justified will depend heavily on the context. Indeed it may not be justifiable in terms of any general principle, except to assert "I am old enough to have already seen this kind of work, with which I am familiar, done better"!

EDIT: having accepted one edit, rejected another, and having now performed my own edit, may I now consider the formatting and punctuation style of my list of rules to be settled once and for all haha!

Steve
  • 12,325
  • 2
  • 19
  • 35
38
  • They are hard to track.
  • Changing all may require changing multiple files in possibly multiple projects (hard to maintain).
  • Sometimes it's hard to tell what their purpose is just by looking at their value.
  • No reuse.
igorc
  • 451
29

Real life example: I am working with a third party system wherein "entities" are stored with "fields". Basically an EAV system. As it is fairly easy to add another field, you get access to one by using the field's name as string:

Field nameField = myEntity.GetField("ProductName");

(note the magic string "ProductName")

This can lead to several problems:

  • I need to refer to external documentation to know that "ProductName" even exists and its exact spelling
  • Plus I need to refer to that doc to see what the datatype of that field is.
  • Typos in this magic string will not get caught until this line of code is executed.
  • When someone decides to rename this field on the server (difficult while preventing dataloss, but not impossible), then I cannot easily search through my code to see where I should adjust this name.

So my solution for this was to generate constants for these names, organised by entity type. So now I can use:

Field nameField = myEntity.GetField(Model.Product.ProductName);

It is still a string constant and compiles to the exact same binary, but has several advantages:

  • After I have typed "Model.", my IDE shows just the available entity types, so I can select "Product" easily.
  • Then my IDE supplies just the fieldnames that are available for this type of entity, also selectable.
  • Auto-generated documentation shows what the meaning of this field is plus the datatype that is used to store its values.
  • Starting from the constant, my IDE can find all places where that exact constant is used (as opposed to its value)
  • Typos will be caught by the compiler. This also applies when a fresh model (possibly after renaming or deleting a field) is used to regenerate the constants.

Next on my list: hide these constants behind generated strongly typed classes - then also the datatype is secured.

14

Magic strings not always bad, so this might the reason you cannot come up with a blanket reason for avoiding them. (By "magic string" I assume you mean string literal as part of an expression, and not defined as a constant.)

In some particular cases, magic strings should be avoided:

  • The same string appears multiple times in code. This means you could have a spelling error one of the places. And it will be a hassle of the string changes. Turn the string into a constant, and you will avoid this issue.
  • The string may change independent of the code where it appears. Eg. if the string is text displayed to the end user it will likely change independent of any logic change. Separating such string into a separate module (or external configuration or database) will make it easier to change independently
  • The meaning of the string is not obvious from the context. In that case introducing a constant will make the code easier to understand.

But in some cases, "magic strings" are fine. Say you have a simple parser:

switch (token.Text) {
  case "+":
    return a + b;
  case "-":
    return a - b;
  //etc.
}

There is really no magic here, and none of the above described problems apply. There would be no benefit IMHO to define string Plus="+" etc. Keep it simple.

JacquesB
  • 61,955
  • 21
  • 135
  • 189
7

To add to existing answers:

Internationalisation (i18n)

If the text to display on screen is hard-coded and buried within layers of functions, you're going to have a very difficult time providing translations of that text into other languages.

Some development environments (e.g. Qt) handle translations by lookup from a base language text string to the translated language. Magic strings can generally survive this - until you decide you want to use the same text elsewhere and get a typo. Even then, it's very hard to find which magic strings need translating when you want to add support for another language.

Some development environments (e.g. MS Visual Studio) take another approach and require all translated strings to be held within a resources database and read back for the current locale by the unique ID of that string. In this case your application with magic strings simply cannot be translated into another language without major rework. Efficient development requires all text strings to be entered into the resources database and given a unique ID when the code is first written, and i18n thereafter is relatively easy. Trying to backfill this after the fact will typically require a very large effort (and yes, I've been there!) so it's much better to do things right in the first place.

Graham
  • 2,062
4

This is not a priority for everyone, but if you ever want to be able to calculate coupling/cohesion metrics on your code in an automated fashion, magic strings make this nearly impossible. A string in one place will refer to a class, method or function in another place, and there is no easy, automatic way to determine that the string is coupled to the class/method/function just by parsing the code. Only the underlying framework (Angular, e.g.) can determine that there is a linkage--and it can only do it at run-time. To obtain the coupling information yourself, your parser would have to know everything about the framework you were using, above and beyond the base language you are coding in.

But again, this is not something a lot of developers care about.