Family Ties part 3: It's All In a Name

2016-05-12 familyties

This is the third of a series of articles on what I’ve learned about Erlang (and Elixir) from writing Erl2ex, an Erlang-to-Elixir transpiler. This week we take a look at the differences between the naming conventions in Erlang and Elixir, and how that affects the languages and their interoperability.

One does not simply capitalize a function name

Conventional names

Every language has rules and conventions around names. This establishes both a way to distinguish different kinds of constructs, and a common vocabulary for communicating intent. For example, Ruby distingushes variable scopes (local vs object member vs global) using sigils, and also has rules around capitalization.

# Ruby code
foo = "hello"   # Local variable
@foo = "hello"  # Object member variable
@@foo = "hello" # Class variable
$foo = "hello"  # Global variable
Foo = "hello"   # Constant

Ruby, like many languages, also has a normative style around how names are constructed. Variable and method names are generally in lower_snake_case (unlike, for example, Java, which uses lowerCamelCase). It is also considered good style to use longer, descriptive names (unlike, for example, Go, which tends to recommend shorter names).

Erlang and Elixir both have their own rules and conventions, and they are a bit different from each other.

% Erlang variables are capitalized.
Var1 = 3.14.
% Erlang atoms are "bare" if they begin with a lower-case letter
Var2 = anAtom.
% Erlang module names are atoms and generally are lower-cased.
-module(mychannel).
% Erlang function names are also atoms.
foo() -> bar.

# Elixir variables are lower-cased
var1 = 3.14
# Elixir atoms begin with a sigil.
var2 = :anAtom
# Elixir modules are capitalized, although we will see later
# what this really means.
defmodule ChatServer.MyChannel do
end
# Elixir function names are lower-cased like variables.
def foo do
  :bar
end

Because of these differences, it is necessary to modify some names when translating code between Erlang and Elixir. For example, all variable names need their capitalization changed. This, in turn, occasionally causes collisions; when lower-casing a variable name, you could make it identical to a function name. To avoid the resulting awkwardness, Erl2ex sometimes renames things when it converts code from one language to another.

Erlang and Elixir do, however, share some rules in common. The underscore variable name “_” is considered “anonymous” in both languages in that it denotes existence of a value but no other constraints on it during pattern matching. Also, variable names beginning with underscore are allowed to be unused in both languages.

Atom smashing in Erlang and Elixir

In Erlang, names for almost every construct—modules, functions, records, attributes, indeed just about everything except for variables—are atoms. This is a convenient feature that makes the syntax fairly uniform. It also provides a common way, across most of its language constructs, to break out of the normal naming conventions when necessary.

Atoms (and thus the names of most Erlang constructs) are represented as “bare” words in the syntax if they begin with a lower-case letter and include only letters, numbers, and underscores. However, you can create arbitrary atoms by quoting:

% Erlang code
% A "bare" atom
Atom1 = anAtom.
% By quoting, you can include arbitrary characters in an atom.
Atom2 = 'E=m*c^2 !'.

This suggests that other language constructs, such as functions, can also be named with arbitrary characters. And indeed, although it is not common in Erlang code, it is possible:

% Erlang code
% Defining a function with an otherwise "illegal" name
'E=m*c^2'() -> "Einstein was a nerd".
% Calling such a function
foo() -> 'E=m*c^2'().

Elixir, of course, can also include arbitrary characters in an atom via quoting:

# Elixir code
atom1 = :anAtom
atom2 = :"E=m*c^2 !"

However, function names appear different from atoms in Elixir syntax; functions are bare words, whereas atoms are preceded by a colon. So Elixir can’t use Erlang’s trick to create unusual names. Nevertheless, it is still possible to do via some metaprogramming, as we will see below.

Modules and aliases

Erlang module names are atoms, and typically are short words beginning with a lower-case letter. When developing in Elixir, you will often use Erlang libraries by referring to the module as an atom. Here is an example using the Erlang “math” module:

# Elixir code
def circle_area(radius) do
  :math.pi * radius * radius
end

When you create an Elixir module, you give it a capitalized name. This is more than just a convention, however. Capitalization actually has a meaning in Elixir: all capitalized names are aliases for atoms. So, for example:

# Elixir code
alias :value, as: MyAlias  # Define MyAlias to be :value
MyAlias == :value          # Returns true

But if you don’t specify a value for an alias, it has a default value, which is an atom with the alias name prefixed by “Elixir.” (including the period).

# Elixir code
UndefinedAlias == :"Elixir.UndefinedAlias"  # Returns true

So when you define a module called ChatServer.MyChannel, the name of your module is actually the atom :"Elixir.ChatServer.MyChannel". This becomes important if you want to call Elixir code from Erlang. You can use Erlang’s atom quoting to reference an Elixir module:

% Erlang code
'Elixir.ChatServer.MyChannel':my_func(1, 2, 3).

Of course, the usual usage of Elixir aliases is to “shorten” a module name. If you have a long, nested module, aliasing lets you use the last “segment” in the name on its own:

# Elixir code
alias My.Module.Path.To.ThisModule  # Implies "as: ThisModule"
ThisModule.my_function()

Although aliases are normally used for module names, they are simply atoms, and so could conceivably used for other purposes. I haven’t come across any other good use for them, though. It does seems to be a one-off feature.

Breaking the rules in Elixir

We’ve seen how Erlang, by quoting atoms, can break some of the normal naming rules for constructs such as modules and functions. Is it possible to do so in Elixir?

Well, first, there’s nothing stopping us from using an arbitrary atom as the name of an Elixir module. Capitalized aliases are a convention in Elixir, and they’re good practice because they keep Elixir code separated in a different namespace (i.e. with the “Elixir.” prefix) from Erlang code. But it is by no means a requirement. This is perfectly legal:

# Elixir code
defmodule :non_elixirish_module do
  def foo() do
    "Hi!"
  end
end
:non_elixirish_module.foo()

This is generally not recommended for Elixir code, but you might do it if you were writing a module in Elixir that was meant to look like an Erlang library and meant to be called from Erlang.

What about functions?

Let’s suppose we were writing a string interpolation library and wanted to create functions called “quote” and “unquote”. (This is probably a bad idea because those terms mean something specific to Elixir metaprogramming. But suppose we really wanted to do it, perhaps because we were porting an Erlang library that exports functions of those names.)

It turns out that defining a function called “unquote” is tougher than you might expect. Suppose we began with the following:

# Elixir code
defmodule StringQuoter do
  def unquote(str) do   # Compile error here
    str |> String.replace("''", "'")
  end
end

Attempting to compile that module will yield the following bizarre error:

== Compilation error on file string_quoter.ex ==
** (CompileError) string_quoter.ex:3: undefined function str/0
    (elixir) expanding macro: Kernel.def/2
    string_quoter.ex:3: StringQuoter (module)

What happened? That certainly looks like valid Elixir code. Well, it turns out that the “def” macro actually puts its arguments through a quoting cycle, and allows you to unquote things inside the function definition. So when defining the function, instead of treating “unquote” as the name of the function to define, Elixir is actually trying to evaluate the expression unquote(str) at compile time, which it can’t do because the name “str” isn’t defined.

Because of that feature of the “def” macro, you can’t define a function called “unquote” directly. However, the very feature that creates the problem can also solve it. The “def” macro evaluates any unquoted constructs, so to create a function named “unquote”, just create an expression that will evaluate to that name. The following module does compile correctly:

# Elixir code
defmodule StringQuoter do
  def unquote(:unquote)(str) do  # Now this works
    str |> String.replace("''", "'")
  end
end

How does that work? In the Elixir AST, the function name is just an atom. So we create an expression that evaluates to that atom. When the “def” macro runs at compile time, it sees the expression unquote(:unquote), evaulates the argument to the unquote function (which is just the atom :unquote) and inserts it into the AST.

We can use this to generate any arbitrary function names, including names with otherwise illegal characters. Also, since this effectively evaluates expressions at compile time, we can also use this to generate functions dynamically. Here’s an example that generates 100 functions with names like “1-doubled”, normally not a legal function name:

# Elixir code
defmodule DiabolicalModule do
  1..100 |> Enum.each(fn n ->
    def unquote(:"#{n}-doubled")() do
      unquote(n * 2)
    end
  end)
end

In the above code, remember that the expression n * 2 is evaulated during unquoting, which takes place at compile time, not runtime. So the actual body of each method just consists of a constant integer, not an expression. That is, the above is equivalent to:

# Elixir code
defmodule DiabolicalModule do
  def unquote(:"1-doubled")() do
    2
  end
  def unquote(:"2-doubled")() do
    4
  end
  # etc...
end

How do you call such a method? Again, Erlang’s technique won’t work in Elixir. You can’t just “quote” the function name in Elixir syntax. However, you can use Kernel.apply/3 to do the trick:

# Elixir code
apply(DiabolicalModule, :"3-doubled", [])  # returns 6

Perhaps less obviously, if you want to call the StringQuoter.unquote/1 function directly from iex, you can name it directly:

# Elixir code
iex(1)> StringQuoter.unquote("hi")
"hi"

However, if you want to call StringQuoter.unquote/1 from another function, you have to use Kernel.apply/3. This is because, again, whenever you have the “unquote” function in a function definition, the “def” macro tries to evaluate it at compile time.

# Elixir code
defmodule StringQuoter do
  def unquote(:unquote)(str) do
    str |> String.replace("''", "'")
  end
  def bad() do
    unquote("Eat at Joe''s")  # Doesn't work: evaluated at compile time
  end
  def good() do
    apply(__MODULE__, :unquote, "Eat at Joe''s")  # Works correctly
  end
end

Where to go from here

These observations came from actual issues I encountered while writing and testing Erl2ex. Some real-life Erlang modules I used as test cases included functions with strange names that could not normally be defined in Elixir. So Erl2ex detects such cases and falls back to the metaprogramming trick describe above.

I have to give Erlang a lot of credit here. The way atoms are represented in Erlang syntax and utilized to name other structures, makes cases like these much more straightforward than in Elixir. On the other hand, Elixir’s metaprogramming capability is extremely powerful, as we wil see later when we look at the preprocessor. But at the least, it is important to understand the differences.

For more insight into techniques for compile-time evaluation, I again recommend Metaprogramming Elixir by Chris McCord.

Next time, we’ll look at a related topic, scoping, studying how the language structures affect their scoping rules, and observing the implications of Erlang’s single assignment policy. Feel free to browse the index of articles in this series, and stay tuned for more on Erlang and Elixir’s family ties.