130

I recently encountered a class which provides pretty much every single-character as a constant; everything from COMMA to BRACKET_OPEN. Wondering whether this was necessary; I read an "article" which suggests that it may be helpful to pull single-character literals into constants. So, I'm skeptical.

The main appeal of using constants is they minimize maintenance when a change is needed. But when are we going to start using a different symbol than ',' to represent a comma?

The only reason I see for using constants instead of literals is to make the code more readable. But is city + CharacterClass.COMMA + state (for example) really more readable than city + ',' + state?

For me the cons outweigh the pros, mainly that you introduce another class and another import. And I believe in less code where possible. So, I'm wondering what the general consensus is here.

Austin Day
  • 1,351

15 Answers15

182

Tautology:

It is very clear if you read the very first sentence of the question that this question is not about appropriate uses like eliminating magic numbers, it is about terrible mindless foolish consistency at best. Which is what this answer addresses

Common sense tells you that const char UPPER_CASE_A = 'A'; or const char A = 'A' does not add anything but maintenance and complexity to your system. const char STATUS_CODE.ARRIVED = 'A' is a different case.

Constants are supposed to represent things that are immutable at runtime, but may need to be modified in the future at compile time. When would const char A = correctly equal anything other than A?

If you see public static final char COLON = ':' in Java code, find whomever wrote that and break their keyboards. If the representation for COLON ever changes from : you will have a maintenance nightmare.

Obfuscation:

What happens when someone changes it to COLON = '-' because where they are using it needs a - instead everywhere? Are you going to write unit tests that basically say assertThat(':' == COLON) for every single const reference to make sure they do not get changed? Only to have someone fix the test when they change them?

If someone actually argues that public static final String EMPTY_STRING = ""; is useful and beneficial, you just qualified their knowledge and safely ignore them on everything else.

Having every printable character available with a named version just demonstrates that whomever did it, is not qualified to be writing code unsupervised.

Cohesion:

It also artificially lowers cohesion, because it moves things away from the things that use them and are related to them.

In computer programming, cohesion refers to the degree to which the elements of a module belong together. Thus, cohesion measures the strength of relationship between pieces of functionality within a given module. For example, in highly cohesive systems functionality is strongly related.

Coupling:

It also couples lots of unrelated classes together because they all end up referencing files that are not really related to what they do.

Tight coupling is when a group of classes are highly dependent on one another. This scenario arises when a class assumes too many responsibilities, or when one concern is spread over many classes rather than having its own class.

If you used a better name like DELIMITER = ',' you would still have the same problem, because the name is generic and carries no semantic. Reassigning the value does no more to help do an impact analysis than searching and replacing for the literal ','. Because what is some code uses it and needs the , and some other code uses but needs ; now? Still have to look at every use manually and change them.

In the Wild:

I recently refactored a 1,000,000+ LOC application that was 18 years old. It had things like public static final COMMA = SPACE + "," + SPACE;. That is in no way better than just inlining " , " where it is needed.

If you want to argue readability you need to learn you to configure your IDE to display whitespace characters where you can see them or whatever, that is just an extremely lazy reason to introduce entropy into a system.

It also had , defined multiple times with multiple misspellings of the word COMMA in multiple packages and classes. With references to all the variations intermixed together in code. It was nothing short of a nightmare to try and fix something without breaking something completely unrelated.

Same with the alphabet, there were multiple UPPER_CASE_A, A, UPPER_A, A_UPPER that most of the time were equal to A but in some cases were not. For almost every character, but not all characters.

And from the edit histories it did not appear that a single one of these was ever edited or changed over the 18 years, because of what should now be obvious reason is it would break way too many things that were untraceable, thus you have new variable names pointing to the same thing that can never be changed for the same reason.

In no sane reality can you argue that this practice is not doing anything but starting out at maximum entropy.

I refactored all this mess out and inlined all the tautologies and the new college hires were much more productive because they did not have to hunt down through multiple levels of indirection what these const references actually pointed to, because they were not reliable in what they were named vs what they contained.

150

The main appeal of using constants is they minimize maintenance when a change is needed.

ABSOLUTELY NOT. This is not at all the reason to use constants because constants do not change by definition. If a constant ever changes then it was not a constant, was it?

The appeal of using constants has nothing whatsoever to do with change management and everything to do with making programs amenable to being written, understood and maintained by people. If I want to know everywhere in my program where a colon is used as a URL separator, then I can know that very easily if I have the discipline to define a constant URLSeparator, and cannot know that easily at all if I have to grep for : and get every single place in the code where : is used to indicate a base class, or a ?: operator, or whatever.

I thoroughly disagree with the other answers which state that this is a pointless waste of time. Named constants add meaning to a program, and those semantics can be used by both humans and machines to understand a program more deeply and maintain it more effectively.

The trick here is not to eschew constants, but rather to name them with their semantic properties rather than their syntactical properties. What is the constant being used for? Don't call it Comma unless the business domain of your program is typography, English language parsing, or the like. Call it ListSeparator or some such thing, to make the semantics of the thing clear.

Eric Lippert
  • 46,558
62

No, that is dumb.

What is not necessarily dumb is pulling things like that into named labels for localization reasons. For example, the thousands delimiter is a comma in America (1,000,000), but not a comma in other locales. Pulling that into a named label (with an appropriate, non-comma name) allows the programmer to ignore/abstract those details.

But making a constant because "magic strings are bad" is just cargo culting.

Telastyn
  • 110,259
29

There are a few characters that are can be ambiguous or are used for several different purposes. For example, we use '-' as a hyphen, a minus sign, or even a dash. You could make separate names as:

static const wchar_t HYPHEN = '-';
static const wchar_t MINUS = '-';
static const wchar_t EM_DASH = '-';

Later, you could choose to modify your code to disambiguate by redefining them as:

static const wchar_t HYPHEN = '-';
static const wchar_t MINUS = '\u2122';
static const wchar_t EM_DASH = '\u2014';

That might be a reason why you'd consider defining constants for certain single characters. However, the number of characters that are ambiguous in this manner is small. At most, it seems you'd do it only for those. I'd also argue that you could wait until you actually have a need to distinguish the ambiguous characters before you factor the code in this manner.

As typographical conventions can vary by language and region, you're probably better off loading such ambiguous punctuation from a translation table.

22

A constant must add meaning.

Defining COMMA to be a comma doesn't add meaning, because we know that a comma is a comma. Instead we destroy meaning, because now COMMA might actually not be a comma anymore.

If you use a comma for a purpose and want to use a named constant, name it after it's purpose. Example:

  • city + CharacterClass.COMMA + state = bad
  • city + CITY_STATE_DELIMITER + state = good

Use functions for formatting

I personally prefer FormatCityState(city, state) and don't care about how the body of that function looks as long as it's short and passes the test cases.

Peter
  • 3,778
17

The idea that a constant COMMA is better than ',' or "," is rather easy to debunk. Sure there are cases where it makes sense, for example making final String QUOTE = "\""; saves heavily on the readibility without all the slashes, but barring language control characters like \ ' and " I haven't found them to be very useful.

Using final String COMMA = "," is not only bad form, it's dangerous! When someone wants to change the separator from "," to ";" they might go change the constants file to COMMA = ";" because it's faster for them to do so and it just works. Except, you know, all the other things that used COMMA now also are semicolons, including things sent to external consumers. So it passes all your tests (because all the marshalling and unmarshalling code was also using COMMA) but external tests will fail.

What is useful is to give them useful names. And yes, sometimes multiple constants will have the same contents but different names. For example final String LIST_SEPARATOR = ",".

So your question is "are single char constants better than literals" and the answer is unequivically no, they aren't. But even better than both of those is a narrowly scoped variable name that explicitly says what its purpose is. Sure, you'll spend a few extra bytes on those extra references (assuming they don't get compiled out on you, which they probably will) but in long term maintenance, which is where most of the cost of an application is, they are worth the time to make.

corsiKa
  • 1,084
4

In addition to all the fine answers here, I'd like to add as food for thought, that good programming is about providing appropriate abstractions that can be built upon by yourself and maybe others, without having to repeat the same code over and over.

Good abstractions make the code easy to use on the one hand, and easy to maintain on the other hand.

I totally agree the DELIMITER=':' in and of itself is a poor abstraction, and only just better than COLON=':' (since the latter is totally impoverished).

A good abstraction involving strings and separators would include a way to pack one or more individual content items into the string and to unpack them from the packed string as well, first and foremost, before telling you what the delimiter is. Such an abstraction would be bundled as a concept, in most languages as a class; for example, so that its use would be practically self documenting, in that you can search for all places where this class is used and be confident of what the programmer's intention regarding the format of the packed strings in each case where some abstraction is used.

Once such an abstraction is provided, it would be easy to use without ever having to consult what the value of the DELIMITER or COLON is, and, changing the implementation details would generally be limited to the implementation. So, in short, these constants should really be implementation details hidden within an appropriate abstraction.

The main appeal of using constants is they minimize maintenance when a change is needed.

Good abstractions, which are typically compositions of several related capabilities, are better at minimizing maintenance. First, they clearly separate the provider from the consumers. Second, they hide the implementation details and instead provide directly useful functionality. Third, they document at a high level when and where they are being used.

Erik Eidt
  • 34,819
3

I've done some work writing lexers and parsers and used integer constants to represent terminals. Single-character terminals happened to have the ASCII code as their numeric value for simplicity's sake, but the code could have been something else entirely. So, I'd have a T_COMMA that was assigned the ASCII-code for ',' as its constant value. However, there were also constants for nonterminals which were assigned integers above the ASCII set. From looking at parser generators such as yacc or bison, or parsers written using these tools, I got the impression that's basically how everybody did it.

So, while, like everybody else, I think it's pointless to define constants for the express purpose of using the constants instead of the literals throughout your code, I do think there are edge cases (parsers, say) where you might encounter code riddled with constants such as you describe. Note that in the parser case, the constants aren't just there to represent character literals; they represent entities that might just happen to be character literals.

I can think of a few more isolated cases where it might make sense to use constants instead of the corresponding literals. For example, you might define NEWLINE to be the literal '\n' on a unix box, but '\r\n' or '\n\r' if you're on windows or mac box. The same goes for parsing files which represent tabular data; you might define FIELDSEPARATOR and RECORDSEPARATOR constants. In these cases, you're actually defining a constant to represent a character that serves a certain function. Still, if you were a novice programmer, maybe you'd name your field separator constant COMMA, not realizing you should have called it FIELDSEPARATOR, and by the time you realized, the code would be in production and you'd be on the next project, so the wrongly named constant would stay in the code for someone to later find and shake his head at.

Finally, the practice you describe might make sense in a few cases where you write code to handle data encoded in a specific character encoding, say iso-8859-1, but expect the encoding to change later on. Of course in such a case it would make much more sense to use localization or encoding and decoding libraries to handle it, but if for some reason you couldn't use such a library to handle encoding issues for you, using constants you'd only have to redefine in a single file instead of hard-coded literals littered all over your source-code might be a way to go.

As to the article you linked to: I don't think it tries to make a case for replacing character literals with constants. I think it's trying to illustrate a method to use interfaces to pull constants into other parts of your code base. The example constants used to illustrate this are chosen very badly, but I don't think they matter in any way.

Pascal
  • 347
2

The one time I have seen such constants used effectively is to match an existing API or document. I've seen symbols such as COMMA used because a particular piece of software was directly connected to a parser which used COMMA as a tag in an abstract syntax tree. I've also seen it used to match a formal specification. in formal specifications, you'll sometimes see symbols like COMMA rather than ',' because they want to be as utterly clear as possible.

In both cases, the use of a named symbol like COMMA helps provide cohesiveness to an otherwise disjoint product. That value can often outweigh the cost of overly verbose notations.

Cort Ammon
  • 11,917
  • 3
  • 26
  • 35
2

Observe that you are trying to make a list.

So, refactor it as: String makeList(String[] items)

In other words, factor out the logic instead of the data.
Languages might be different in how they represent lists, but commas are always commas (that's a tautology). So if the language changes, changing the comma character won't help you -- but this will.

user541686
  • 8,178
0

Maybe.

Single character constants are relatively hard to distinguish. So it can be rather easy to miss the fact that you're adding a period rather than a comma

city + '.' + state

whereas that's a relatively hard mistake to make with

city + Const.PERIOD + state

Depending on your internationalization and globalization environment, the difference between an ASCII apostrophe and the Windows-1252 open and close apostrophe (or the ASCII double quote and the Windows-1252 open and close double quote) may be significant and is notoriously difficult to visualize looking at code.

Now, presumably, if mistakenly putting a period rather than a comma was a significant functional issue, you would have an automated test that would find the typo. If your software is generating CSV files, I would expect that your test suite would discover pretty quickly that you had a period between the city and the state. If your software is supposed to run for clients with a variety of internationalization configurations, presumably your test suite will run in each environment and will pick up if you have a Microsoft open quote if you meant to have an apostrophe.

I could imagine a project where it made more sense to opt for more verbose code that could head off these issues particularly when you've got older code that doesn't have a comprehensive test suite even though I probably wouldn't code this way in a green field development project. And adding a constant for every punctuation character rather than just those that are potentially problematic in your particular application is probably gross overkill.

Justin Cave
  • 12,811
0

If this was a class written as a part of an application by your fellow developer, this is almost certainly a bad idea. As others already pointed out, it makes sense to define constants such as SEPARATOR = ',' where you can change the value and the constant still makes sense but much less so constants whose name describes just their value.

However, there are at least two cases where it does make sense to declare constants whose name describes exactly their contents and where you cannot change the value without appropriately changing the constant's name:

  • Mathematical or physical constants, e.g. PI = 3.14159. Here, the role of the constant is to act as a mnemonic since the symbolic name PI is much shorter and more readable than the value it represents.
  • Exhaustive lists of symbols in a parser or keys on a keyboard. It might even make sense to have a list of constants with most or all Unicode characters and this is where your case may fall. Some characters such as A are obvious and clearly recognizable. But can you easily tell А and A apart? The first one is Cyrillic letter А while the latter is Latin letter A. They are different letters, represented by different Unicode code points, even though graphically they are almost identical. I'd rather have constants CYRILLIC_CAPITAL_A and LATIN_CAPITAL_A in my code than two almost-identical-looking characters. Of course, this is pointless if you know you will only be working with ASCII characters which do not contain Cyrillic. Likewise: I use Latin alphabet day-to-day so if I were writing a program which needed a Chinese character, I would probably prefer to use a constant rather than insert a character which I don't understand. For someone using Chinese characters day-to-day, a Chinese character may be obvious but a Latin one may be easier to represent as a named constant. So, as you see, it depends on the context. Still, a library might contain symbolic constants for all characters since the authors can't know in advance how the library is going to be used and which characters might need constants to improve readability in a specific application.

However, such cases are usually handled by system classes or special-purpose libraries and their occurrence in code written by application developers should be very rare unless you are working on some very special project.

0

As a philosophical contrapunctus to the majority opinion, I must state that there are some of us, who appreciate the unsophisticated 19th century French peasant programmer and

remembered his monotonous, everlasting lucidity, his stupefyingly sensible views of everything, his colossal contentment with truisms merely because they were true. "Confound it all!" cried Turnbull to himself, "if he is in the asylum, there can't be anyone outside."

G.K. Chesterton, The Ball and The Cross

There is nothing wrong appreciating the truth and there is nothing wrong with stating the truth, especially when talking to a computer.

If you lie to the computer, it will get you

Perry Farrar - Germantown, Maryland (from More Programming Pearls )


But, for the most part I agree with the people who say it's dumb. I'm too young to have learned to programmed FORTRAN, but I've heard tell that you could redefine 'A' = 'Q' and come up with all sorts of wonderful cryptograms. You are not doing this.

Beyond the i18n issues brought up before (which are not redefining the glyph "COMMA", but truly redefining the glyph of a DECIMAL_POINT). Constructing French carroty quotes or British single quotes to convey meaning to humans is on thing and those really ought to be variables, not constants. The constant would be AMERICAN_COMMA := ',' and the comma := AMERICAN_COMMA

And, if I were using a builder pattern to construct an SQL Query, I would much rather see

sb.append("insert into ")
 .append(table_name)
 .append(" values ")
 .append(" ( ")
 .append(val_1)
 .append(",")
 .append(val_2)
 .append(" ); ")

than any thing else, but if you were going to add constants, it would be

INSERT_VALUES_START = " ( "
INSERT_VALUES_END = " ) "
INSERT_VALUES_SEPARATOR = " , "
QUERY_TERMINATOR = ";"

sb.append("insert into ") .append(table_name) .append(" values ") .append(INSERT_VALUES_START) .append(val_1) .append(INSERT_VALUES_SEPARATOR) .append(val_2) .append(INSERT_VALUES_END) .append(QUERY_TERMINATOR)


However, if you've ever watched anyone else program (or type) you might notice some interesting quirks. Not all of us are stellar typists. Lots of us got in to programming late or were raised with Soviet keyboards (where keys type on you) and we like to cut and paste individual letters instead of trying to find them on the keyboard and/or rely on autocomplete.

Nothing is going to autocomplete a string for you, so if I can get a comma by pressing 'con', alt-space, down, down, down, enter and get a quote by pressing 'con', alt-space, down, down, enter. I might just do that.


Another thing to remember about string literals is the way they are compiled. In Delphi at least, (which is the only language I've obsessed over the stack of) you'll wind up your literals popped into the stack of each function. So, lots of literals = lots of function overhead; "," in function_A is not the same bit of memory as a "," in function_B". To combat this, there's a "resource string" which can be built and linked in sideways - and this is how they do i18n stuff (killing two birds with one bush). In Python all your string literals are objects, and it might actually seem nice to use utils.constants.COMMA.join(["some","happy","array","strings"]), but it's not a stellar idea for the points repeated over and over on this page.

Peter Turner
  • 6,955
-1

Are single-character constants better than literals?

There are a lot of conflations floating around here. Let me see if I can tease them apart.

Constants provide:

  • semantics
  • change, during development
  • indirection

Going down to a single character name only impacts the semantics. A name should be useful as a comment and clear in context. It should express meaning, not the value. If it can do all that with a single character fine. If it can't, please don't.

A literal and a constant can both change during development. This is what brings up the magic number issue. Strings can be magic numbers as well.

If semantic meaning exists, and since both are constant, then whether the constant has more value than a literal comes down to indirection.

Indirection can solve any problem, other than to much indirection.

Indirection can solve the magic number problem because it allows you to decide on a value for an idea in one place. Semantically, for that to be worthwhile the name must make what that idea is clear. The name should be about the idea, not the value.

Indirection can be overdone. Some prefer to search and replace literals to make their changes. That's fine so long as 42 is clearly the meaning of life and not mixed together with 42, the atomic number of molybdenum.

Whither you can make useful distinctions like that with a single letter depends largely on context. But I wouldn't make it a habit.

candied_orange
  • 119,268
-4

But when are we going to start using a different symbol than ',' to represent a comma?

For localisation.

In English-speaking countries, the symbol separating the whole and fractional parts of a decimal is ".", which we call "decimal point". In many other countries, the symbol is "," and is typically called the equivalent of "comma" in the local language. Similarly, where English-speaking countries use "," to separate groups of three digits in large numbers (such as 1,000,000 for one million), countries that use a comma as a decimal point use a dot (1.000.000).

So there is a case for making DECIMAL_POINT and COMMA constants if you are doing globalisation.

Paul G
  • 1