Family Ties part 6: Being On Guard

2016-05-31 familyties

This is the sixth of a series of articles on what I’ve learned about Erlang (and Elixir) from writing Erl2ex, an Erlang-to-Elixir transpiler. This week we study guard clauses and learn why Elixir guards look the way they do.

That's not how a guard works

Update (2016-06-11) … Jose Valim pointed out that it is possible to recreate guard sequences in Elixir after all, by using multiple “when” clauses. I updated the relevant section. Thanks for the correction, Jose!

A guarded question

In the Elixir standard library docs, you’ll occasionally come across the following notation:

“Allowed in guard tests.”

If you’ve worked much with Elixir, you probably know what that refers to. Guards, which you can recognize as when clauses in Elixir, can contain only certain types of expressions. You can compare values and test types, using functions that are “allowed in guard tests”:

# This elixir function includes legal guards
def what_kind?(x) when is_integer(x) and x > 10, do: 'big integer'
def what_kind?(x) when is_float(x) and x > 10.0, do: 'big float'
def what_kind?(x), do: 'something else'

However, most functions, including those you define in your own modules, are not allowed in guards:

# This elixir function fails to compile because calling
# :math.sin/1 is not allowed in a guard.
def what_kind?(x) when :math.sin(x) > 0, do: 'top of the wave'
def what_kind?(x), do: 'something else'

Why the limitation? Is there no way to allow user-defined functions and other useful constructs in guards?

These are questions that have been asked repeatedly in both the Erlang and Elixir communities. One answer that you may have heard is that guards cannot produce side effects. But why is that important? What is a guard really doing?

To answer these questions, we’ll look a little deeper into how Erlang defines guards. First we’ll examine the structure of Erlang guards in comparison with Elixir guards. We’ll explore differences in how the two languages handle exceptions in guards, as well as differences in the way conditionals and comprehensions treat guards. Finally, we’ll talk about why the limitations are there, and what you can do to get around them. This will be a fairly long article, but there’s a lot of useful information to glean, so let’s get started.

Erlang’s complex guards

Erlang guards have a somewhat more complex structure than what we usually find in Elixir. In Elixir, a guard is simply a boolean expression. But in Erlang, it may comprise more than one expression; in fact, generally, it is actually a list of lists of boolean expressions.

Let’s start with a simple guard in Erlang:

% The same erlang guard
'what_kind?'(X) when is_integer(X) -> "integer";

In this function clause, the expression is_integer(X) is called a “guard expression”. But it is only a simple case of a guard. You can add more guard expressions, delimited by commas, and this clause will be selected only if all are true. In other words, it behaves almost as if the expressions were joined into a single expression by Erlang’s “andalso” operator. (I say “almost” because there is a subtle difference that I’ll discuss later.)

For example:

% Multiple guard expressions forming a guard
'what_kind?'(X) when is_integer(X), X > 10 -> "big integer";

The two comma-separated boolean expressions must both pass for this function clause to execute. That is, the above is (almost) equivalent to:

% An (almost) equivalent single guard expression
'what_kind?'(X) when is_integer(X) andalso X > 10 -> "big integer";

Such a list of comma-separated guard expressions is simply called a “guard”. But Erlang doesn’t stop there. You can also add additional guards (i.e. lists of guard expressions), delimited by semicolons. The function clause will be selected if at least one of those guards passes. In other words, multiple semicolon-delimited guards behaves almost as if they were joined by the “orelse” operator. Such a list of semicolon-delimited guards is known as a “guard sequence”.

% Mutliple guards forming a guard sequence
'what_kind?'(X) when
    is_integer(X), X > 10;
    is_float(X), X > 100.0
  -> "big integer or huge float";

This is (almost) equivalent to:

% The (almost) equivalent single guard expression
'what_kind?'(X) when
    (is_integer(X) andalso X > 10)
    orelse (is_float(X) andalso X > 100.0)
  -> "big integer or huge float";

At first glance, the commas and semicolons may look simply like shorthand for boolean operators. It turns out there is a difference, but before we get into that, we should note that the same restrictions apply to expressions in Erlang guards as to Elixir guards. Only certain operators, and a few blessed built-in functions, are allowed in Erlang guard sequences. Anything else is rejected by the compiler.

% Erlang's compiler rejects calling math:sin in a guard.
'what_kind?'(X) when math:sin(X) > 1 -> "top of the wave";

Again, why the restriction? The reason is as revealing as it is subtle. Guards may look like normal code, but they are not. They are actually something else. Ulf Wiger puts it, I think, very helpfully:

Think of guards as an extension of the pattern matching, i.e. a purely declarative, and quite pervasive, part of the language.

Looking at it this way, it could be argued that the problem is that some parts of the patterns look like normal expressions. ;-)

I think Erlang, with its “complex” guard sequences, helps us to see this more clearly than Elixir guards by themselves. In Elixir, a guard looks just like any other expression. But Erlang guard sequence syntax is different from a “normal” Erlang expression. It can include comma- and semicolon-delimited lists of expressions, which can’t normally appear in other places:

% A guard sequence won't compile as "normal code" in a function;
% the commas and semicolons are not "normal" syntax.
foo(X) ->
    is_integer(X), X > 10;
    is_float(X), X > 100.0.

So Erlang helps us understand that guards are actually something else; they are better considered part of the pattern match rather than a normal expression. And just like you can’t put arbitrary code (like function calls) in a pattern match, because it just doesn’t make sense as part of pattern match syntax, neither can you include arbitrary code in a guard.

Specifically, code that has side effects (which, broadly speaking, means code that could send and receive messages) is not allowed in a guard, as is code that could fail to terminate. (As a corollary, guard evaluation is not Turing-complete.)

And if you think about it, such restrictions make a lot of sense. Suppose you have a function with a number of pattern-matched (and guarded) clauses. If those guards were allowed to have side effects, then you would need to be able to control whether and in what order they execute. Programmers do not tolerate non-deterministic side effects. Erlang would need to specify and lock down those semantics, and that would hamstring the function dispatch. For example, it would prevent the VM from reordering, parallelizing, caching, or omitting guard execution, thus cripping the optimization of the language’s most crucial bottleneck. So the ban on side effects is critical to the viability of guards as a feature.

But let’s come back to the question of why Erlang has guard sequences at all. Why not simply support boolean expressions joined with “ands” and “ors”?

Exceptional guards

Suppose you had a function that took a list but needed to behave differently if it had a length of 2. The built-in length function could be used in a guard for this purpose.

% In Erlang...
guard_test(X) when length(X) == 2 ->
    handle_length_2;
guard_test(X) ->
    handle_another_length.

# The equivalent in Elixir...
def guard_test(x) when length(x) == 2 do
  :handle_length_2
end
def guard_test(x) do
  :handle_another_length
end

This function behaves how we’d expect.

iex(1)> length([1, 2])
2
iex(2)> guard_test([1, 2])
:handle_length_2
iex(3)> guard_test([1, 2, 3])
:handle_another_length

Now, the length function requires a list. What happens if you pass in something else, like a binary?

iex(1)> length("hi")
** (ArgumentError) argument error
    :erlang.length("hi")
iex(2)> guard_test("hi")
:handle_another_length

The length function crashes if you pass in something that is not a list. However, interestingly, it doesn’t crash when used in a guard. That’s because any “error” that is raised in a guard, merely causes that test to fail (i.e. evaluate to false). This behavior is the same in both Erlang and Elixir. Guards never crash; they only succeed or fail.

Why? Again, it makes sense if you think about it. A crash is like a side effect. If guards could crash, you’d have to specify and control whether and in what order they execute, so that you know what crash should take place. Again, that would hamstring function dispatch and prevent the VM from optimizing it. So instead, the BEAM turns guards into pure functions by removing the possibility of crashes.

So far so good. But now let’s extend our function to handle binaries in addition to lists. And again, we want inputs with a length of 2 to follow one code path, and other inputs to follow another. In Erlang, we can extend the guard sequence with another expression to test binary size:

% Handling binaries in Erlang
guard_test2(X) when length(X) == 2; byte_size(X) == 2 ->
    handle_length_2;
guard_test2(X) ->
    handle_another_length.

Remember, a guard sequence passes if at most one of its parts passes. In this case, the first clause will match if a list of length two, OR a binary of two bytes, is passed in. In the latter case, the first expression length(X) == 2 causes an error, but since it is in a guard, it just evaluates to false and moves on to the next expression, byte_size(X) == 2, which succeeds. Let’s test it:

iex(1)> guard_test2([1, 2])
:handle_length_2
iex(2)> guard_test2("hi")
:handle_length_2
iex(3)> guard_test2("longer")
:handle_another_length

Remember how we said that a guard sequence (i.e. the semicolon delimiter) is almost but not quite the same as using the “orelse” operator in Erlang? Now we’ll see why. In the above example, when you pass in a binary, the length call throws an error, causing that guard expression to fail, but the rest of the expressions in the guard sequence still have a chance to execute. However, if you use “orelse”, then both the length and byte_size calls are in a single expression:

% Failed attempt to handle binaries in Erlang
broken_guard_test2(X) when length(X) == 2 orelse byte_size(X) == 2 ->
    handle_length_2;
broken_guard_test2(X) ->
    handle_another_length.

Now if you pass in a binary, the length function throws an error, which causes the entire expression to fail.

iex(1)> broken_guard_test2("hi")
:handle_another_length

In other words, Erlang’s guard sequences let you “insulate” individual expressions from errors thrown by other expressions.

What about Elixir? As in Erlang, a crash in a guard will cause the entire guard to fail. So attempting to check either list length or binary length using the “or” operator behaves the same as when we tried to use Erlang’s “orelse”.

# Failed attempt to handle binaries in Elixir
def broken_guard_test2(x) when length(x) == 2 or byte_size(x) == 2 do
  :handle_length_2
end
def broken_guard_test2(x) do:
  :handle_another_length
end

Again, if you pass in a binary, the length function throws an error, causing the entire expression to fail:

iex(1)> broken_guard_test2("hi")
:handle_another_length

So in Elixir, are we stuck? Not quite. It doesn’t seem to be well-documented, but Jose helpfully pointed out that if you supply multiple “when” clauses, Elixir will treat it as a guard sequence, just as if you used semicolon delimiters in Erlang.

# The correct way to handle binaries in Elixir.
# Create a guard sequence using multiple "when" clauses
def guard_test2(x) when length(x) == 2 when byte_size(x) == 2 do
  :handle_length_2
end
def guard_test2(x) do:
  :handle_another_length
end

Now it works! If the first guard fails with an error, the second will still run, and return the result we want:

iex(1)> guard_test2("hi")
:handle_length_2

Alternatively, you can test the types of inputs in your guards. Indeed, perhaps it’s good coding practice to be explicit rather than depending on the exception-suppressing behavior of guards.

# In guards, check types explicitly for more clarity:
def guard_test2(x) when
    is_list(x) and length(x) == 2 or is_binary(x) and byte_size(x) == 2 do
  :handle_length_2
end
def guard_test2(x) do:
  :handle_another_length
end

But that isn’t the only way guards are treated differently between Elixir and Erlang.

The meaning of “if”

The Guard behind the statement

Those of us who came from imperative and object-oriented languages tend to have a very concrete expectation around how conditionals, such as the “if” statement, should behave. In Elixir, because of pattern matching, the “if” macro is not used as frequently as in other languages, but it still feels very familiar and follows the same traditional behavior.

It can be surprising, then, how different Erlang’s “if” statement is. To start off, we might know that, while Elixir’s version is a binary conditional (choosing one of two branches based on a boolean result), Erlang’s version is an n-ary conditional that may depend on any number of expressions.

# Elixir's if statement is binary:
def describe_length1(a) do
  if length(a) == 0 do
    "zero length"
  else
    "nonzero length"
  end
end

% Erlang's if statement is n-ary:
describe_length2(A) ->
  if
    length(A) == 0 ->
      "zero length";
    length(A) == 1 ->
      "unit length";
    true ->
      "larger length"
  end.

At first glance, this looks like it simply corresponds to Elixir’s cond statement:

# Elixir's cond statement is n-ary:
def describe_length3(a) do
  cond do
    length(a) == 0 ->
      "zero length"
    length(a) == 1 ->
      "unit length"
    true ->
      "larger length"
  end
end

However, there’s an important difference: Elixir’s cond supports arbitrary expressions, but Erlang’s if actually uses guard sequences. This means:

Expressions in Erlang’s if statement are limited to those that can appear in guards. In Elixir’s cond, you can include arbitrary expressions with side effects.
You can include multiple comma- and semicolon-delimited expressions (i.e. guard sequences) in Erlang’s if statement. Elixir’s cond requires you to use and and or operators for this purpose.
Erlang’s if statement exhibits the exception-suprressing behavior of guards. Elixir’s cond does not.

So Erlang would allow us to expand this function to support binaries:

% Erlang's if statement exhibits guard behavior:
describe_length4(A) ->
  if
    length(A) == 0; byte_size(A) == 0 ->
      "zero length";
    length(A) == 1; byte_size(A) == 1 ->
      "unit length";
    true ->
      "larger length"
  end.

As we saw earlier, guards suppress exceptions, and furthermore, the expressions in each guard sequence are insulated from errors thrown by the others. So we could pass a binary into the above function, and it would work as expected. Reproducing this functionality using Elixir’s cond would be more difficult. We would need to check the type explicitly because otherwise exceptions would get thrown normally.

Finally, it is also important to remember the difference between guards and Elixir expressions with relation to truthiness. In a guard, the atom :true is considered true, and all other values are considered false. In an Elixir expression, :nil and :false are considered false, and all other values are considered true.

% Erlang's if statement uses guard truthiness.
1> if true -> yes; true -> no end.
yes
2> if false -> yes; true -> no end.
no
3> if tootrue -> yes; true -> no end.
no
4> if 1 -> yes; true -> no end.
no

# Elixir's conditionals use Elixir expression truthiness.
iex(1)> if true, do: :yes, else: :no
:yes
iex(2)> if false, do: :yes, else: :no
:no
iex(3)> if :tootrue, do: :yes, else: :no
:yes
iex(4)> if 1, do: :yes, else: :no
:yes

Overall, whereas Elixir’s if and cond macros behave very much like “traditional” conditionals, Erlang’s if statement is best understood through its connection with guards. It is a way to invoke guards and their peculiar semantics without needing to create a new function. On the Elixir side, if you want to achieve that same goal—specifically invoking guard semantics—you might consider the Kernel.match?/2 macro

But wait… there’s more!

Comprehending guards

Both Elixir and Erlang support comprehensions. These are very useful, highly expressive ways to generate and filter collections of data. But the implementations in the two languages are subtly different in several ways. One of those is, you guessed it, the use of guards.

In Elixir, the “filters” in a comprehension are arbitrary expressions. They may produce side effects and throw exceptions. In Erlang, they may be arbitrary expressions or they may be guards. This can get confusing, so let’s look at an example.

This Erlang function takes a list of lists as input. It returns the elements that are of unit length, and discards the rest. A comprehension makes this very simple:

% Erlang function that returns the unit length lists.
erl_filter_unit_length(List) ->
  [Elem || Elem <- List, length(Elem) == 1].

You could write this similarly in Elixir:

def ex_filter_unit_length(list) do
  for elem <- list, length(elem) == 1, do: elem
end

These two functions behave similarly if you give them the expected list of lists:

iex(1)> erl_filter_unit_length([ [1], [2, 3], [4] ])
[[1], [4]]
iex(2)> ex_filter_unit_length([ [1], [2, 3], [4] ])
[[1], [4]]

However, suppose one of the elements is not a list. As we’ve seen, the built-in length function throws an error if given a non-list argument. Here are the consequences:

iex(1)> erl_filter_unit_length([ [1], [2, 3], :non_list ])
[[1]]
iex(2)> ex_filter_unit_length([ [1], [2, 3], :non_list ])
** (ArgumentError) argument error
             :erlang.length(:non_list)

The Elixir version crashes because our comprehension tries to evaluate length/1 on :non_list, which throws an error. However, in the Erlang comprehension, the filter is treated as a guard. The error is suppressed, and simply causes that filter to return false.

Does this mean filters in Erlang comprehensions are simply guards, just like the filters in Erlang’s if statement? Not quite. It turns out that when Erlang compiles a comprehension, it analyzes each filter, treating it as a guard if it is a valid guard epxression, or otherwise treating it as a normal expression. So, recalling our Erlang comprehension:

% Erlang function that returns the unit length lists.
erl_filter_unit_length(List) ->
  [Elem || Elem <- List, length(Elem) == 1].

Currently, that length check is a valid guard, and so Erlang compiles it into a guard. But we can force it to be a normal expression by introducing syntax that is not allowed in a guard, such as a local function call.

% Same as the original, except the filter is no longer a guard.
one() -> 1.
erl_filter_unit_length_nonguard(List) ->
  [Elem || Elem <- List, length(Elem) == one()].

When we execute that new function, it now behaves like our Elixir version. The expression is no longer a guard, so it no longer suppresses the error.

iex(1)> erl_filter_unit_length([ [1], [2, 3], :non_list ])
[[1]]
iex(2)> erl_filter_unit_length_nonguard([ [1], [2, 3], :non_list ])
** (ArgumentError) argument error
             :erlang.length(:non_list)

So in Erlang, a comprehension may or may not include guards, depending on what kind of code is present. But in Elixir, you have no choice: comprehension expressions are always normal expressions rather than guards. Personally, in this case, I prefer Elixir’s approach. It’s consistent, and offers less chance of confusion.

Enhancing guards

Let’s wrap up by briefly discussing another Elixir technique for dealing with guards. We’ve seen that the compiler conservatively limits what can appear in a guard, whitelisting a small set of known safe functions and disallowing everything else, because it wants to guarantee the lack of side effects. In particular, you cannot define your own functions and call them from a guard. This makes it difficult to write complex guards (which, one could argue, is the point). However, in some cases, you can get around this in Elixir by using a macro instead of a function.

Suppose we were writing a function, such as “zip”, which wants to behave specially if the lists passed as arguments are the same length. We could write a function to test two such lists:

# Simple function that tests whether lists are the same length
def same_length(list1, list2) do
  length(list1) == length(list2)
end

Unfortunately, we cannot use our own functions in a guard.

# This clause fails to compile because we can't use the custom
# function same_length in a guard.
def zip(list1, list2) when same_length(list1, list2) do
  # logic for same length
end
def zip(list1, list2) do
  # logic for different lengths
end

Instead, let’s rewrite same_length as a macro.

# Simple macro that tests whether lists are the same length
defmacro same_length_macro(list1, list2) do
  quote do
    length(unquote(list1)) == length(unquote(list2))
  end
end

What’s the difference here? Instead of defining a function, we’ve created a macro that looks like a function but expands at compile time. Effectively, the body of the macro, length(list1) == length(list2) gets inlined at the call point. This means:

# This clause, with the macro in the guard...
def zip(list1, list2) when same_length_macro(list1, list2) do
  # logic for same length
end
# ...gets expanded at compile time so it's equivalent to this:
def zip(list1, list2) when length(list1) == length(list2) do
  # logic for same length
end

Now the code is legal for a guard, and thus it compiles successfully.

Of course, there are still limitations. You cannot include arbitrary code in such a macro; it must still expand to code that is allowed in a guard clause. Recursion is still out, as are sends and receives. But this technique can sometimes be used to make commonly-used guard expressions more readable.

In fact. this technique is used in several places in the Elixir standard library to define custom “functions” that are allowed in guard clauses. An example is Kernel.is_nil/1, which could be written trivially as a function, but is implemented as a macro specifically so it can appear in guards. So if you’re looking through the standared library documentation and come across macros that look like they should be simple functions, check for that no-longer-so-mysterious comment, “Allowed in guard clauses”. It might give you a clue as to why the standard library looks the way it does.

Where to go from here

When I was first learning Elixir, guards seemed to be confusingly at odds with much of the rest of the language. It was only through learning Erlang that it became more clear what a guard actually is and how they should be used. I found the relevant chapter in the book Learn You Some Erlang For Great Good to be particularly helpful on this topic.

Elixir by itself obscures some of the distinctive properties of guards because they look nearly indistinguishable from “normal” expressions. To its credit, Elixir drops some features that make guards confusing, such as the option to use them in comprehension filters. However, it is still useful to study how they work in both languages, to avoid being caught by surprise in one of the many corner cases. Incidentally, treatment of guards remains a major source of headaches (and bugs) in Erl2ex, because of the lack of Elixir support for some of those cases.

Next time, we’ll sample a few “missing features” present in Erlang but not supported in Elixir, and look at some workarounds. Feel free to browse the index of articles in this series, and stay tuned for more on Erlang and Elixir’s family ties.