This is the ninth of a series of articles on what I’ve learned about Erlang (and Elixir) from writing Erl2ex, an Erlang-to-Elixir transpiler. This week we continue coverage of the Erlang preprocessor, discussing how preprocessor features can be reproduced using Elixir.
Revisiting the Erlang preprocessor
Last time we began looking at the Erlang preprocessor, which is used extensively in Erlang code but does not have an equivalent in Elixir. We saw how the preprocessor “hijacks” the module attribute syntax to enable simple metaprogramming features. We also suggested that Elixir macros might be able to reproduce the same capabilities in a more integrated way.
This week we’ll take a deeper look at the question of whether Elixir macros can replace the Erlang preprocessor. At a basic level, it seems the main use case for Erlang preprocessor macros can easily be replicated. A preprocessor macro is simply a function-like construct that generates syntax to be injected into the source code, replacing any parameters inline. Here’s a simple example that we looked at last time:
Elixir macros perform the same basic function. They are functions that run at compile time, generating syntax (or more precisely, AST) to be injected into the parsed Elixir code, unquoting and replacing parameters. Here’s an Elixir example that mirrors the above Erlang macro:
For simple cases, this pattern should hold up and allow us to reproduce Erlang preprocessor constructs in fairly idiomatic Elixir. However, we also need to note a few differences between Erlang and Elixir that will make edge cases more difficult.
- The Erlang preprocessor operates on text syntax (or more specifically, tokens) whereas Elixir macros operate on AST. This means Elixir macros will work only in cases where the Elixir parser has already run and produced normalized output. This will exclude a few Elixir edge cases as we shall see below.
- Erlang preprocessor macros may be undefined and redefined, whereas Elixir macros may not. If you need to set and track state during compilation in Elixir, it is better to use module attributes.
- Elixir macros are hygenic, whereas Erlang preprocessor macros are not. This generally makes Elixir macros self-contained and easier to reason about, but it also makes certain tasks simpler in Erlang.
- The preprocessor supports sharing definitions through including header files, whereas Elixir macros are not designed for this use case. However, we will investigate alternate ways to share definitions below.
Let’s take a closer look at these differences
Metaprogramming in Erlang and Elixir
Erlang provides two metaprogramming mechanisms. First, the preprocessor allows certain transformations to be done between tokenization and parsing, as we have seen. Second, Erlang parse transforms allow manipulation of Erlang AST after parsing.
I won’t go into too much detail on parse transforms. They are generally not well documented, and their use is even mildly discouraged. However, they are pretty straightforward, if rudimentary. A parse transform is a module that exports the function
parse_transform/2. That function is called at a specific point in the compilation pipeline, after parsing, and is passed a list of forms (as Erlang ASTs), along with a set of options. It is expected to return a list of possibly modified forms. Then, when you are writing another Erlang module, you can tell it to apply the parse transform via a compiler directive (in the form of a module attribute).
The most common example of an Erlang parse transform can be found in eunit, the standard unit test framework. The eunit_striptests module provides a parse transform that removes functions whose names end with a well-known “test” suffix. Then the eunit.hrl include file can be configured to apply the striptests parse transform.
Contrast Erlang’s two mechanisms—preprocessor macros and parse transforms—with, for example, Ruby. Ruby’s metaprogramming support arises from the property that Ruby has no distinct compilation step. The functionality you would normally associate with a compiler, such as defining classes, is all done at runtime. You do not compile Ruby code; you execute a Ruby script that defines classes.
Elixir falls in between these two extremes. Like Erlang, Elixir has a separate compile phase, and its metaprogramming capabilities involve performing operations during that phase. However, like Ruby (which itself ultimately derives its model from Lisp), Elixir’s metaprogramming model can (mostly) be thought of as “executing” an Elixir source file as a “script” that defines modules and functions.
Such a “lisp-like” or “ruby-like” view of Elixir can help us understand both the strengths and the quirks of Elixir macros. They provide tremendous power and flexibility unmatched by Erlang’s more primitive parse transform mechanism. However, they are not as straightforward as a text-oriented preprocessor, and making sense of the edge cases requires a deep understanding of Elixir’s compiler.
Elixir macros and the compiler
Consider the following Erlang code. It defines a simple macro whose value is just an atom, and then attempts to use that macro in a function and a module attribute.
Now if we compile this and attempt to retrieve the value of “attr_with_macro”, we get the value of the macro, as expected.
Remember that Erlang’s preprocessor just does a simple token substitution. Parsing has not run yet, so the preprocessor doesn’t care where macros are substituted. You can invoke a macro pretty much anywhere.
If we attempt to reproduce this in Elixir, however, we get a surprise.
Attempting to compile the above fails with
undefined function simple_macro/0. Elixir let us invoke the macro from within a function definition but not in a module attribute. However, interestingly enough, this limitation is present only for macros defined in the same module; if you require a different module, this works:
I don’t claim to understand the nuances of why this happens. It appears that the compiler allows macros to be expanded in module attributes, but that evaluation of module attribute happens early in the compilation pipeline, before the module’s macros themselves are defined. Thus, you can only expand macros from another module that has already been compiled and required.
This applies to anything defined as a module attribute, including, significantly, type specifications. Type specifications are, as a result, difficult to metaprogram in Elixir. In general, because Elixir macros run at compile time, they are dependent on the compiler context, and their place within the compilation pipeline.
Erlang, of course, doesn’t suffer from this complication because preprocessing takes place strictly before the compiler begins working.
Erlang macros, like C preprocessor macros, can be undefined and redefined at will, and can be queried in order to implement conditional compilation.
Elixir macros, in contrast, are properties of the module. The compiler will not let you “undefine” or redefine an Elixir macro. It turns out you can query whether a macro has been defined (at compile time) using the
Module.defines?/2 function, but this is rarely done. Elixir macros are not intended to manage compile-time state.
But last time we already discussed a mechanism well-suited for managing compile-time state. Elixir’s module attributes, or, as we preferred to call them, compiler variables.
So the properties of Erlang’s preprocessor macros can be reproduced by several different Elixir constructs. Erl2ex, in fact, uses combinations of Elixir macros and compiler variables to translate Erlang preprocessor code.
One of the important features of Elixir macros is hygiene, which protects variables in the context from being clobbered by variables defined in the macro. To understand this, let’s take a look at a simple Erlang macro.
Normally you would just write a function for this, but suppose for some reason you wanted to use a macro. Because a preprocessor macro does only text substitution, you may run encounter a problem if you pass an expression in as the argument:
The argument is expanded inline multiple times, and so is evaluated redundantly. To get around this, you might modify your macro like this:
That’s better in that it evaluates the expensive expression only once. However, the expansion now introduces an extra variable. Suppose we already used the variable name “X” in the substitution…
This compiles but doesn’t execute correctly if X != Y because it would attempt to rebind X to different values. Thus, in order to use this macro, you need to know that it binds the variable “X”, and that you therefore should not use that name elsewhere. Because of this, Erlang macros that create variables often try to name those variables specially, for example by prefixing them with underscores for example, to reduce the chance of name collisions. (The eunit include file does this.) This, of course, causes other issues because leading underscores are meant in both languages to signal unused variables.
Elixir macros are different. “Inner” variables defined by a macro are actually scoped separately from anything unquoted in its context. Here’s the Elixir equivalent of the above example:
Unlike the Erlang example, that Elixir example works because the “square2” macro is hygenic. Its “x” variables are in a different scope than the “norm” function’s “x” variables so they don’t conflict.
That said, you can disable hygiene by “annotating” a variable bound in a macro with the
Kernel.var!/2 macro so it shares the scope with its context. This would make Elixir macros behave more like Erlang preprocessor macros.
One final common application of the preprocessor is to share definitions using a header file. This technique is borrowed from C, in which the “interface” to a set of code is defined in a header “.h” file while the implementation is provided in a corresponding “.c” file. Things like function declarations, type definitions such as structs, and constants are kept in the header file and “shared” with outside code.
Erlang code shares similar information in header files. Types, function specs, record definitions, constants, and macros are often collected in Erlang header “.hrl” files, which are then included in Erlang source using the
-include directive. This file inclusion is done by the preprocessor, just as it is with C.
Here’s an example. Consider an Erlang header file with various kinds of definitions:
In Erlang, any module can include this header file to get these definitions.
Elixir has no preprocessor and no mechanism for header file inclusion. So how can you share these definitions in Elixir, and ensure they are consistent across modules? The Elixir strategy is to use another module.
Now any module that needs these definitions can access them via the MySharedDefs module. However, there are a few differences to be aware of.
First, Elixir doesn’t really have a way to share constants. Modules sometimes use module attributes as constants, but these exist only during compilation of that module, and cannot be shared directly with other modules. In the example above, we defined an Elixir “constant” using a function that returns its value, so that it can be shared with other modules.
Second, Erlang file inclusion happens at compile time (or, more precisely, preprocessor time). This means it is possible for such “shared” information to be different if the header file was changed between compiling one module and another. In Elixir, some of this sharing happens at compile time (for example, accessing records and macros), and some happens at runtime (for example, accessing constants defined using functions). Thus, you have to be cognizant of your build process to make sure your “shared” information remains consistent.
Types and records carry further subtleties of their own, and we will dedicate next week’s article to studying them.
Preprocessors considered harmful?
We’ve now seen a number of pros and cons of the Elixir and Erlang approaches to metaprogramming, and it can naturally bring up the philosophical question of which approach is to be preferred.
In the larger language community, of course, there has been a decided shift away from preprocessors. C and C++ have them, but most languages that see themselves as C’s successors—including Java, D, Go, and Rust—eschew a preprocessor in favor of other mechanisms. (A notable exception is C#, but even its “preprocessor” directives are toned down to limit their use cases, and their implementation does not require a separate preprocessor step.)
So does that mean preprocessors are evil? That question has already been debated ad nauseum over many years, and I won’t rehash the arguments here. However, I hope the brief study of the pros and cons of the Elixir alternatives that we have done here, has suggested that the answer might not be simple. Erlang has a C-style preprocessor. Elixir has dropped it in favor of compile-time macros. In many cases, Elixir macros are cleaner and more powerful, but, as we have seen, they can also introduce increased complexity. There are trade-offs, and which tool is to be preferred may depend on the application.
The most important take-away is that Erlang macros and Elixir macros, while apparently solving many of the same use cases, are completely different in fundamental ways. The Erlang preprocessor, like the C preprocessor, is technically a wholly separate language with its own semantics and tools, and it should be treated as such. Elixir macros are a core part of Elixir as a language, and need to be understood in the context of the operation of the Elixir compiler itself.
So perhaps the “curse” of the preprocessor isn’t so much that the preprocessor itself is a curse. But rather, that the differences between Erlang’s preprocessor and Elixir’s macros, make the two languages, which are otherwise so similar to each other, quite different to work with in practice. Siblings they may be, but like many siblings, these have taken fundamentally different paths.
Where to go from here
This article has only touched on a few of the nuances of Elixir macros. The deal with hygiene is explained quite well in the reference documentation for “quote”. And for a fuller treatment of metaprogramming overall, I once again recommend Chris McCord’s book Metaprogramming Elixir. In particular, I’d highlight a piece of advice he gives near the end of the book:
Some of the greatest insights I’ve had with macros have been with irresponsible code that I would never ship to production. There’s no substitute for learning by experimentation. Don’t let the rules you’ve learned throughout this book and the hazards of this chapter scare you from fully exploring Elixir’s macro system. Write irresponsible code, experiment, and have fun. Use the insight you gain to drive design decisions of things you would ship to a production system.
Elixir is still a very young language. Many of its nuances, especially around its metaprogramming model, are not well documented, or even well understood. Much of what I’ve learned so far about Elixir and its relationship with Erlang, has come from experimentation, even from doing things that would be downright terrifying in a real application. But the possibility of that exploration is part of what makes this an exciting time for Elixir developers.
Next time, we’ll look at support for records and type specifications in Erlang and Elixir. Until then, feel free to browse the index of articles in this series, and stay tuned for more on Erlang and Elixir’s family ties.