Family Ties part 7: Lost and Found

2016-06-06 familyties

This is the seventh of a series of articles on what I’ve learned about Erlang (and Elixir) from writing Erl2ex, an Erlang-to-Elixir transpiler.

This week we study some “missing features” available in Erlang but missing or hard to find in Elixir. In some cases, they seem to be deliberate omissions of features considered to be problematic or not useful; while other cases might be oversights. We’ll also explore some techniques for working around these omissions.

They'll never short-circuit our operators

Non-short-circuit operators

Erlang includes two versions of its boolean operators. The and and or operators fully evaulate both of their operands, even if the right hand side is not strictly needed. For example:

1> true or false.
true
2> true or throw(rhs).
** exception throw: rhs
3> false and true.
false
4> false and throw(rhs).
** exception throw: rhs

On the other hand, the andalso and orelse operators are short-circuiting; they do not evaluate the right hand side operand if the left operand already determines the outcome.

1> true orelse false.
true
2> true orelse throw(rhs).
true
3> false andalso true.
false
4> false andalso throw(rhs).
false

Elixir also provides two sets of boolean operators. However, both sets are short-circuiting. (The only difference is in whether they allow operands that are not strictly boolean values.)

iex(1)> true or throw(:rhs)
true
iex(2)> true || throw(:rhs)
true
iex(3)> false and throw(:rhs)
false
iex(4)> false && throw(:rhs)
false

So Elixir has analogues for Erlang’s andalso and orelse operators, but not for and and or. If we wanted the non-short-circuiting behavior, are we stuck? Not quite. They don’t appear as such in Erlang’s documentation, but there exist functions :erlang.and/2 and :erlang.or/2 that you can call to get the non-short-circuit behavior.

iex(1)> :erlang.or(true, false)
true
iex(2)> :erlang.or(true, throw(:rhs))
** (throw) :rhs
iex(3)> :erlang.and(false, true)
false
iex(4)> :erlang.and(false, throw(:rhs))
** (throw) :rhs

These functions appear to be the actual implementations behind Erlang’s operators of the same name. Elixir developers may know that Elixir’s operators are generally implemented by functions (or macros) located in Kernel or Kernel.SpecialForms. Erlang seems to do something similar; the erlang module contains functions (not documented, but present) corresponding to most of Erlang’s operators, such as :erlang.xor/2, :erlang.not/1, :erlang.div/2, and even :erlang.+/2. These can be called as such, or captured, from either Erlang or Elixir (although Elixir code would normally use the Elixir versions in the Kernel module).

% Calling Erlang operators as functions
1> erlang:'not'(true).
false
2> erlang:'+'(1, 2).
3
3> lists:foldl(fun erlang:'+'/2, 0, [1, 2, 3]).
6

# Calling Erlang operators from Elixir
iex(1)> :erlang.not(true)
false
iex(2)> :erlang.+(1, 2)
3
iex(3)> Enum.reduce([1, 2, 3], 0, &:erlang.+/2)
6

Interestingly, the andalso and orelse operators are an exception. They are not available to be called as functions from Erlang, probably because Erlang function call semantics don’t support short-circuiting. But they can be called as such from Elixir, probably because the latter implements them as macros.

% There are no functions in Erlang for short-circuit operators
1> erlang:'andalso'(false, true).
** exception error: undefined function erlang:'andalso'/2
2> erlang:'orelse'(true, false).
** exception error: undefined function erlang:'orelse'/2

# Elixir can call those "Erlang" operators as macros, and they
# exhibit the proper short circuit behavior.
iex(1)> :erlang.andalso(false, throw(:rhs))
false
iex(2)> :erlang.orelse(true, throw(:rhs))
true

I can’t think of a reason to actually do that in Elixir (as opposed to just using Elixir’s operators), but the capability seems to be there.

Map update operators

Elixir provides a convenient syntax for updating keys in a map. It requires that the key already exist; you cannot add a new key using Elixir’s map update syntax.

iex(1)> map1 = %{a: 1, b: 2}
%{a: 1, b: 2}
iex(2)> map2 = %{map1 | a: 3}
%{a: 3, b: 2}
iex(3)> map3 = %{map2 | c: 4}
** (KeyError) key :c not found in: %{a: 3, b: 2}

Erlang also provides a syntax for updating maps. Here is the Erlang equivalent of the above code.

1> Map1 = #{a => 1, b => 2}.
#{a => 1,b => 2}
2> Map2 = Map1#{a := 3}.
#{a => 3,b => 2}
3> Map3 = Map2#{c := 4}.
** exception error: {badkey,c}

If you’re an Elixir developer and look closely at the Erlang code, you might notice something a bit peculiar. Erlang seems to use different operators for creating and updating the map. Adding keys to a new map was done with =>, whereas updating keys in an existing map was done with :=.

Indeed, Erlang has two separate operators for map manipulation. The former, =>, may either create a new key or update an existing key. The latter, := may only update an existing key. It throws an error if the key is not already present in the map (just as Elixir’s map update syntax requires that the key already be present.)

As a result, Erlang’s map update syntax actually supports an operation that Elixir’s does not: adding a new key to an existing map, using the => operator:

1> Map1 = #{a => 1, b => 2}.
#{a => 1,b => 2}
2> Map2 = Map1#{c => 4}.
#{a => 1,b => 2,c => 4}

If you want to perform this operation in Elixir, you must use Map.put/3 (docs) or Map.merge/2 (docs).

iex(1)> map1 = %{a: 1, b: 2}
%{a: 1, b: 2}
iex(2)> map2 = Map.put(map1, :c, 4)
%{a: 3, b: 2, c: 4}
iex(3)> map3 = Map.merge(map2, %{d: 5, e: 6})
%{a: 3, b: 2, c: 4, d: 5, e: 6}

Generalized comprehensions

Comprehensions are powerful and very expressive. It took a while for me, with a Java/Ruby background, to internalize them into my vocabulary, and I still find it easy to fall into the trap of abusing them. But, used well, they greatly simplify some tasks that might otherwise take many lines of code.

Elixir comprehensions have three basic parts: generators, filters, and a collectable. Of the three, filters are optional but at least one generator and a collectable are required. (The collectable might be implicit; it’s a list if not otherwise specified.)

Here’s a simple Elixir comprehension with one generator, implicitly collecting into a list.

iex(1)> for x <- [1, 2, 3], do: x * 2
[2, 4, 6]

And the Erlang equivalent:

1> [X * 2 || X <- [1, 2, 3]].
[2, 4, 6]

Elixir comprehensions, of course, have some features not available in Erlang comprehensions. The collector is always a list or a bitstring in Erlang, but Elixir can use anything that implements the Collectable protocol, such as a set or map. Similarly, an Erlang generator always gets its data from a list or bitstring, while Elixir can also generate from anything that implements Enumerable, such as a stream. So Elixir effectively generalizes comprehensions, making them powerful tools for data transformation.

Here’s an example from the Elixir getting started guide. The comprehension reads lines from stdin as a stream, upcases them, and writes the results to stdout via same stream. It works because Elixir’s IO.Stream implements both protocols, allowing it to serve as both a generator and a collector.

iex(1)> stream = IO.stream(:stdio, :line)
%IO.Stream{device: :standard_io, line_or_bytes: :line, raw: false}
iex(2)> for line <- stream, into: stream, do: String.upcase(line)
Hello Elixir
HELLO ELIXIR
Another line
ANOTHER LINE
^C^C

As an aside, the existence of such a feature highlights an interesting property of Elixir. It is an integrated platform in which the language itself and the standard library are interdependent and cannot function without each other. Elixir comprehensions, a language feature, depends for its internal implementation on library modules such as Enum that handle enumerables and collectables. Indeed, with so many language constructs actually implemented as macros, Elixir blurs the very distinction between what is “language” versus “library”.

Back to comprehensions. Erlang, it turns out, offers a different generalization of comprehensions. While Elixir comprehensions require that the first clause be a generator, Erlang has no such restriction. For example, Erlang comprehensions could include filters only:

% Erlang comprehension with a filter but no generator.
singleton_or_empty(Value, Condition) -> [Value || Condition].
% Example runs:
1> singleton_or_empty(1, true).
[1]
2> singleton_or_empty(2, false).
[]

Attempting to translate this “directly” to Elixir won’t work.

# Elixir disallows comprehensions with a filter but no generator.
def singleton_or_empty_broken(value, condition) do
  for condition, do: value  # Compile error here
end

Although, interestingly, a completely degenerate comprehension with no filters or generators at all is legal in Elixir:

# A degenerate comprehension in Elixir.
1> for do: 1
[1]

Erl2ex handles Erlang comprehensions that begin with a filter, by pulling out leading filters into a conditional. It would translate the above singleton_or_empty function into the following valid Elixir code

# A working Elixir translation.
def singleton_or_empty(value, condition) do
  if condition do
    for do: value
  else
    []
  end
end

Honestly, the usefulness of such a construct is probably limited, which is probably why Elixir drops support for it. But in the event the use case does come up (and I have encountered examples in the wild), the Erlang form is a convenient shorthand that avoids a verbose conditional.

Dynamic function references

Here’s one more example of an Erlang feature that is missing in Elixir.

Elixir’s “capture” operator (the ampersand) provides a way to create a first-class anonymous function that references a named function. For example, here’s a way to take the sine of each element in a list, by capturing the :math.sin/1 function and passing it to Enum.map/2.

iex(1)> [1, 2, 3] |> Enum.map(&:math.sin/1)
[0.8414709848078965, 0.9092974268256817, 0.1411200080598672]

The capture operator above uses three “parameters”—the module name, the function name, and the arity—to identify the function to capture. The same thing can be done in Erlang using the “fun” construct:

1> lists:map(fun math:sin/1, [1, 2, 3]).
[0.8414709848078965,0.9092974268256817,0.1411200080598672]

Suppose we wanted to generalize this to any math function. Erlang allows any of the three parts (module, name, arity) to be passed as variables. For example:

% Erlang lets you specify the function name as a variable
map_math(Funcname, List) ->
    lists:map(fun math:Funcname/1, List).
% Example usage:
1> map_math(sin, [1, 2, 3]).
[0.8414709848078965,0.9092974268256817,0.1411200080598672]
2> map_math(sqrt, [1, 2, 3]).
[1.0,1.4142135623730951,1.7320508075688772]

Unfortunately, Elixir doesn’t support this. More specifically, Elixir allows the module to be a variable, but the function name and arity must be literal values. I suspect this is because the syntax doesn’t have a good way to distinguish variables from literal function names:

# A direct Elixir translation does not work.
def map_math_broken(funcname, list) do
  Enum.map(list, &:math.funcname/1)  # Doesn't work, tries to treat
                                     # funcname as a literal name
end

If you need to specify a function name or arity with a variable in Elixir, use :erlang.make_fun/3, which is the implementation behind Erlang’s “fun” operator. Like the other operator implementation functions that we looked at earlier, make_fun is available but not well documented. Here’s an Elixir implementation that does work:

# Use :erlang.make_fun to pass variables to a capture.
def map_math(funcname, list) do
  Enum.map(list, :erlang.make_fun(:math, funcname, 1)
end

Where to go from here

We’ve now seen a number of small differences between Erlang and Elixir. We’ve seen some of the workarounds that are implemented in Erl2ex to translate these corner cases properly into Elixir. However, it is still worth re-emphasizing that the similarities between the two languages are far more numerous than the differences. In many cases, you simply need to dig just a bit deeper to find the commonality.

For example, the first draft of this article had a section on “missing” options in Elixir’s binary literal syntax. Erlang provides a way specify signedness, endianness, and other properties of binary values. Initially, I couldn’t find equivalents for some of these options in Elixir’s getting started guide, leading me to wonder if they were supported in Elixir at all. It was pointed out to me that Elixir actually does support, and document, all the binary literal options that Erlang supports. But you need to go beyond the getting started guide, and read the reference pages to learn how.

In general, remember that most of Elixir’s language forms are nominally “implemented” as Elixir macros, and you need to look up the reference material on those macros to find the full language documentation. In particular, the Kernel and Kernel.SpecialForms modules “define” most of what some of us would otherwise expect to be the “language syntax”. They are very useful reads.

There are still a few important differences that we need to cover, though. Next time, we’ll start looking at a major Erlang feature that is not directly supported in Elixir: the preprocessor. Until then, feel free to browse the index of articles in this series, and stay tuned for more on Erlang and Elixir’s family ties.