Family Ties part 3: It's All In a Name
familyties
This is the third of a series of articles on what I’ve learned about Erlang (and Elixir) from writing Erl2ex, an Erlang-to-Elixir transpiler. This week we take a look at the differences between the naming conventions in Erlang and Elixir, and how that affects the languages and their interoperability.
Conventional names
Every language has rules and conventions around names. This establishes both a way to distinguish different kinds of constructs, and a common vocabulary for communicating intent. For example, Ruby distingushes variable scopes (local vs object member vs global) using sigils, and also has rules around capitalization.
Ruby, like many languages, also has a normative style around how names are constructed. Variable and method names are generally in lower_snake_case (unlike, for example, Java, which uses lowerCamelCase). It is also considered good style to use longer, descriptive names (unlike, for example, Go, which tends to recommend shorter names).
Erlang and Elixir both have their own rules and conventions, and they are a bit different from each other.
Because of these differences, it is necessary to modify some names when translating code between Erlang and Elixir. For example, all variable names need their capitalization changed. This, in turn, occasionally causes collisions; when lower-casing a variable name, you could make it identical to a function name. To avoid the resulting awkwardness, Erl2ex sometimes renames things when it converts code from one language to another.
Erlang and Elixir do, however, share some rules in common. The underscore variable name “_” is considered “anonymous” in both languages in that it denotes existence of a value but no other constraints on it during pattern matching. Also, variable names beginning with underscore are allowed to be unused in both languages.
Atom smashing in Erlang and Elixir
In Erlang, names for almost every construct—modules, functions, records, attributes, indeed just about everything except for variables—are atoms. This is a convenient feature that makes the syntax fairly uniform. It also provides a common way, across most of its language constructs, to break out of the normal naming conventions when necessary.
Atoms (and thus the names of most Erlang constructs) are represented as “bare” words in the syntax if they begin with a lower-case letter and include only letters, numbers, and underscores. However, you can create arbitrary atoms by quoting:
This suggests that other language constructs, such as functions, can also be named with arbitrary characters. And indeed, although it is not common in Erlang code, it is possible:
Elixir, of course, can also include arbitrary characters in an atom via quoting:
However, function names appear different from atoms in Elixir syntax; functions are bare words, whereas atoms are preceded by a colon. So Elixir can’t use Erlang’s trick to create unusual names. Nevertheless, it is still possible to do via some metaprogramming, as we will see below.
Modules and aliases
Erlang module names are atoms, and typically are short words beginning with a lower-case letter. When developing in Elixir, you will often use Erlang libraries by referring to the module as an atom. Here is an example using the Erlang “math” module:
When you create an Elixir module, you give it a capitalized name. This is more than just a convention, however. Capitalization actually has a meaning in Elixir: all capitalized names are aliases for atoms. So, for example:
But if you don’t specify a value for an alias, it has a default value, which is an atom with the alias name prefixed by “Elixir.” (including the period).
So when you define a module called ChatServer.MyChannel
, the name of your module is actually the atom :"Elixir.ChatServer.MyChannel"
. This becomes important if you want to call Elixir code from Erlang. You can use Erlang’s atom quoting to reference an Elixir module:
Of course, the usual usage of Elixir aliases is to “shorten” a module name. If you have a long, nested module, aliasing lets you use the last “segment” in the name on its own:
Although aliases are normally used for module names, they are simply atoms, and so could conceivably used for other purposes. I haven’t come across any other good use for them, though. It does seems to be a one-off feature.
Breaking the rules in Elixir
We’ve seen how Erlang, by quoting atoms, can break some of the normal naming rules for constructs such as modules and functions. Is it possible to do so in Elixir?
Well, first, there’s nothing stopping us from using an arbitrary atom as the name of an Elixir module. Capitalized aliases are a convention in Elixir, and they’re good practice because they keep Elixir code separated in a different namespace (i.e. with the “Elixir.” prefix) from Erlang code. But it is by no means a requirement. This is perfectly legal:
This is generally not recommended for Elixir code, but you might do it if you were writing a module in Elixir that was meant to look like an Erlang library and meant to be called from Erlang.
What about functions?
Let’s suppose we were writing a string interpolation library and wanted to create functions called “quote” and “unquote”. (This is probably a bad idea because those terms mean something specific to Elixir metaprogramming. But suppose we really wanted to do it, perhaps because we were porting an Erlang library that exports functions of those names.)
It turns out that defining a function called “unquote” is tougher than you might expect. Suppose we began with the following:
Attempting to compile that module will yield the following bizarre error:
What happened? That certainly looks like valid Elixir code. Well, it turns out that the “def” macro actually puts its arguments through a quoting cycle, and allows you to unquote things inside the function definition. So when defining the function, instead of treating “unquote” as the name of the function to define, Elixir is actually trying to evaluate the expression unquote(str)
at compile time, which it can’t do because the name “str” isn’t defined.
Because of that feature of the “def” macro, you can’t define a function called “unquote” directly. However, the very feature that creates the problem can also solve it. The “def” macro evaluates any unquoted constructs, so to create a function named “unquote”, just create an expression that will evaluate to that name. The following module does compile correctly:
How does that work? In the Elixir AST, the function name is just an atom. So we create an expression that evaluates to that atom. When the “def” macro runs at compile time, it sees the expression unquote(:unquote)
, evaulates the argument to the unquote function (which is just the atom :unquote
) and inserts it into the AST.
We can use this to generate any arbitrary function names, including names with otherwise illegal characters. Also, since this effectively evaluates expressions at compile time, we can also use this to generate functions dynamically. Here’s an example that generates 100 functions with names like “1-doubled”, normally not a legal function name:
In the above code, remember that the expression n * 2
is evaulated during unquoting, which takes place at compile time, not runtime. So the actual body of each method just consists of a constant integer, not an expression. That is, the above is equivalent to:
How do you call such a method? Again, Erlang’s technique won’t work in Elixir. You can’t just “quote” the function name in Elixir syntax. However, you can use Kernel.apply/3
to do the trick:
Perhaps less obviously, if you want to call the StringQuoter.unquote/1
function directly from iex, you can name it directly:
However, if you want to call StringQuoter.unquote/1
from another function, you have to use Kernel.apply/3
. This is because, again, whenever you have the “unquote” function in a function definition, the “def” macro tries to evaluate it at compile time.
Where to go from here
These observations came from actual issues I encountered while writing and testing Erl2ex. Some real-life Erlang modules I used as test cases included functions with strange names that could not normally be defined in Elixir. So Erl2ex detects such cases and falls back to the metaprogramming trick describe above.
I have to give Erlang a lot of credit here. The way atoms are represented in Erlang syntax and utilized to name other structures, makes cases like these much more straightforward than in Elixir. On the other hand, Elixir’s metaprogramming capability is extremely powerful, as we wil see later when we look at the preprocessor. But at the least, it is important to understand the differences.
For more insight into techniques for compile-time evaluation, I again recommend Metaprogramming Elixir by Chris McCord.
Next time, we’ll look at a related topic, scoping, studying how the language structures affect their scoping rules, and observing the implications of Erlang’s single assignment policy. Feel free to browse the index of articles in this series, and stay tuned for more on Erlang and Elixir’s family ties.