Understanding how symbols differ from strings
One of the most useful but misunderstood aspects of Ruby is the difference between symbols and strings. One reason for this is there are certain methods of Ruby that deal with symbols, but will still accept strings, or perform string-like operations on a symbol. Another reason is due to the popularity of Rails and its pervasive use of ActiveSupport::HashWithIndifferentAccess
, which allows you to use either a string or a symbol for accessing the same data. However, symbols and strings are very different internally, and serve completely different purposes. However, Ruby is focused on programmer happiness and productivity, so it will often automatically convert a string to a symbol if it needs a symbol, or a symbol to a string if it needs a string.
A string in Ruby is a series of characters or bytes, useful for storing text or binary data. Unless the string is frozen, you append to it, modify existing characters in it, or replace it with a different string.
A symbol in Ruby is a number with an attached identifier that is a series of characters or bytes. Symbols in Ruby are an object wrapper for an internal type that Ruby calls ID
, which is an integer type. When you use a symbol in Ruby code, Ruby looks up the number associated with that identifier. The reason for having an ID
type internally is that it is much faster for computers to deal with integers instead of a series of characters or bytes. Ruby uses ID
values to reference local variables, instance variables, class variables, constants, and method names.
Say you run Ruby code as follows:
foo.add(bar)
Ruby will parse this code, and for foo
, add
, and bar
, it will look up whether it already has an ID associated with the identifier. If it already has an ID, it will use it; otherwise, it will create a new ID
value and associate it with the identifier. This happens during parsing and the ID
values are hardcoded into the VM instructions.
Say you run Ruby code as follows:
method = :add foo.send(method, bar)
Ruby will parse this code, and for method
, add
, foo
, send
, and bar
, Ruby will also look up whether it already has an ID associated with the identifier, or create a new ID
value to associate with the identifier if it does not exist. This approach is slightly slower as Ruby will create a local variable and there is additional indirection as send
has to look up the method to call dynamically. However, there are no calls at runtime to look up an ID
value.
Say you run Ruby code as follows:
method = "add" foo.send(method, bar)
Ruby will parse this code, and for method
, foo
, send
, and bar
, Ruby will also look up whether it already has an ID associated with the identifier, also creating the ID if it doesn't exist. However, during parsing, Ruby does not create an ID
value for add
because it is a string and not a symbol. However, when send
is called at runtime, method
is a string value, and send
needs a symbol. So, Ruby will dynamically look up and see whether there is an ID associated with the add
identifier, raising a NoMethodError
if it does not exist. This ID
lookup will happen every time the send method is called, making this code even slower.
So, while it looks like symbols and strings are as interchangable as the method
argument to send
, this is only because Ruby tries to be friendly to the programmer and accept either. The send
method needs to work with an ID, and it is better for performance to use a symbol, which is Ruby's representation of an ID, as opposed to a string, which Ruby must perform substantial work on to convert to an ID.
This not only affects Kernel#send
but also affects most similar methods where identifiers are passed dynamically, such as Module#define_method
, Kernel#instance_variable_get
, and Module#const_get
. The general principle when using these methods in Ruby code is always to pass symbols to them, since it results in better performance.
The previous examples show that when Ruby needs a symbol, it will often accept a string and convert it for the programmer's convenience. This allows strings to be treated as symbols in certain cases. There are opposite cases, where Ruby allows symbols to be treated as strings for the programmer's convenience.
For example, while symbols represent integers attached to a series of characters or bytes, Ruby allows you to perform operations on symbols such as <
, >
, and <=>
, as if they were strings, where the result does not depend on the symbol's integer value, but on the string value of the name attached to the symbol. Again, this is Ruby doing so for the programmer's convenience. For example, consider the following line of code:
object.methods.sort
This results in a list sorted by the name of the method, since that is the most useful for the programmer. In this case, Ruby needs to operate on the string value of the symbol, which has similar performance issues as when Ruby needs to convert a string to a symbol internally.
There are many other methods on Symbol that operate on the internal string associated with the symbol. Some methods, such as downcase
, upcase
, and capitalize
, return a symbol by internally operating on the string associated with the symbol, and then converting the resulting value back to a symbol. For example, symbol.downcase
basically does symbol.to_s.downcase.to_sym
. Other methods, such as []
, size
, and match
, operate on the string associated with the symbol, such as symbol.size
being shorthand for symbol.to_s.size
.
In all of these cases, it is possible to determine what Ruby natively wants. If Ruby needs an internal identifier, it will natively want a symbol, and only accept a string by converting it. If Ruby needs to operate on text, it will natively want a string, and only accept a symbol by converting it.
So, how does the difference between a symbol and string affect your code? The general principle is to be like Ruby, and use symbols when you need an identifier in your code, and strings when you need text or data. For example, if you need to accept a configuration value that can only be one of three options, it's probably best to use a symbol:
def switch(value) case value when :foo # foo when :bar # bar when :baz # baz end end
However, if you are dealing with text or data, you should accept a string and not a symbol:
def append2(value) value.gsub(/foo/, "bar") end
You should consider whether you want to be as flexible as many Ruby core methods, and automatically convert a string to a symbol or vice versa. If you are internally treating symbols and strings differently, you should definitely not perform automatic conversion. However, if you are only dealing with one of the types, then you have to decide how to handle it. Automatically converting the type is worse for performance, and results in less flexible internals, since you need to keep supporting both types for backward compatibility. Not automatically converting the type is better for performance, and results in more flexible internals, since you are not obligated to support both types. However, it means that users of your code will probably get errors if they pass in a type that is not expected. Therefore, it is important to understand the trade-off inherent in the decision of whether to convert both types. If you aren't sure which trade-off is better, start by not automatically converting, since you can always add automatic conversion later if needed.
In this section, you learned the important difference between symbols and strings, and when it is best to use each. In the next section, you'll learn how best to use Ruby's core collection classes.