7

Why do programming languages like Ruby use symbols? I understand that String manipulation is much slower than using a lookup table as well as the idea that Strings are reallocated in memory no matter if it is the same or different as one used previously, but can't interpreters compensate for this? It would seem like an interpreter still has to parse the word that you typed in order to match it to a symbol, so why not just do the same with a string object?

For instance, why doesn't the compiler take:

myHash["myKey"] = ...

and treat it as

myHash[:myKey] = ...

behind the scenes, anyways? Even if the key is dynamic - it's an interpreter, so should it not know what the key is going to be before it finds the value and still treat the string key as a symbol? eg.:

concatMe = "Key"
myHash["my" + concatMe] = ...

How come an interpreter can't still treat this as

myHash[:myKey]

If it knows what

"my" + concatMe

is before it finds the value by key?

AndrewKS
  • 1,083

1 Answers1

7

TD;DR: Strings are mutable. Symbols are not. Strings and symbols serve different purposes.

an interpreter still has to parse the word that you typed in order to match it to a symbol

:foo == "foo" could be determined by interning the string or turning the symbol into a string. In any event, if the interpreter interned every string it saw, it would have to do a lot of extra work when those strings are mutated, a poor tradeoff. It would also be unable to garbage collect those strings, which would be totally unperformant. In fact, interning all strings to symbols would be far less performant than the current behavior.

Ruby does not use string pooling. You can tell this pretty easily by creating a large number of the same string and profiling the interpreter's memory usage. However, such implementation details are very low on the list of tensions you should consider when deciding to use a string or a symbol.

I understand that String manipulation is much slower than using a lookup table

What does "much slower" mean to you? Are sub-microsecond timings "much slower"? Because that's what we're talking about. Use strings and symbols where appropriate, not based on some imagined performance concern with no real-world impact except in pathological cases.

as well as the idea that Strings are reallocated in memory no matter if it is the same or different as one used previously, but can't interpreters compensate for this?

Yes, and they are also garbage collected when no longer referenced. Symbols are never garbage collected. It's a tradeoff.

In many languages (such as Erlang, which uses 'atoms'), strings are actually just lists of characters (or integers). In these languages, interning all strings into symbols internally would be even more cost prohibitive.

Rein Henrichs
  • 13,230