Family Ties part 10: My type of language

2016-08-02 familyties

This is the tenth of a series of articles on what I’ve learned about Erlang (and Elixir) from writing Erl2ex, an Erlang-to-Elixir transpiler. This week we cover records, the type system, and dialyzer support in both languages.

You killed my records

Reviewing Erlang Records

Early versions of Erlang provided two compound data structures: lists and tuples. Lists are ubiquitous among functional languages, and tuples work well for small static collections of data. However, neither is by itself particularly good for representing collections of fields—data for which you might use a “struct” or “record” in other mainstream languages.

Thus, Erlang introduced its own take on the “record” data type. Here’s a simple example of declaring a record type representing a person with contact information:

% A simple record definition in Erlang
-record(person, {name, phone, address}).

You can then create and interact with record data, referencing the fields by name:

% Create a person
Joe = #person{name="Joe Blow", phone="555-1234", address="123 Main Street"}.

% Access a field
JoeName = Joe#person.name.

It is important to note, however, that the data itself is represented as a tuple. The first element is the record name (e.g. person in the example above), and subsequent elements are the values of each field. The information that makes it a “record”—the names and default values of the fields, and the mapping between field names and indexes into the tuple—is available at compile time, but thrown away at runtime. For example, at runtime, there is no way to tell a “person” record from an ordinary tuple whose first element happens to be the atom person.

% Create an instance of a "person" record
JoeRecord = #person{name="Joe Blow", phone="555-1234", address="123 Main Street"}.

% Create a tuple identical to the structure of the above record
JoeTuple = {person, "Joe Blow", "555-1234", "123 Main Street"}.

% Evaluates to true
JoeTuple == JoeRecord.

Erlang provides syntax to introspect a record’s fields and structure. For example, #person.name evaluates to the index, in the tuple, of the record’s name field. These expressions are evaluated at compile time, however, and turned into simple values in the compiled code.

Erlang also creates what appears to be a function record_info/2 that returns field information, but this too is a compile-time artifact and doesn’t actually generate a real function in the compiled module. However, because it is treated specially by the compiler, you cannot define a real function called record_info/2. The compiler will disallow it.

% The record_info/2 pseudo-function returns info about a record.
% This call returns [name, phone, address].
record_info(fields, person).

% Attempting to define your own record_info/2 causes a compiler error.
record_info(A, B) -> {A, B}.

% The compiler does let you define record_info/1 however.
record_info(A) -> {A}.

So far you can probably see some of the strengths and weaknesses of Erlang records. Because records are merely tuples, creating them and looking up fields are fairly efficient at runtime. However, type information is thrown away at runtime, so reflection becomes more difficult.

Death and Resurrection of Records

Elixir chose a different approach to representing collections of fields. The Elixir “struct” uses an underlying dictionary (an Erlang map) to represent this type of data. This is similar to objects or property collections in languages such as Ruby, Python, or Javascript. Unlike Erlang records, whose field names exist only during compile time and are thrown away at runtime, Elixir structs store their field names as keys that are thus accessible at runtime.

The reasons behind Elixir’s preference for structs over records are varied, but a key concern was better support for runtime polymorphism, via pattern matching and Elixir protocols. (For a more detailed summary, see this elixir-lang-talk post by Jose Valim.)

But records are not dead (yet). Although they are not well-advertised or promoted, and are often even omitted from Elixir guides and introductions, records do exist (in a somewhat abbreviated form) in Elixir. The Record module provides macros for defining records, and for setting and querying record fields.

# Need this in order to call macros in the Record module.
require Record

# Define a record. This defines the "person" macro used below.
Record.defrecord :person, [:name, :phone, :address]

# Create a record
joe_record = person(name: "Joe Blow", phone: "555-1234", address: "123 Main Street")

# Get a record field
name = person(joe_record, :name)

Just as Erlang records are a compile-time construct, Elixir records also exist only at compile time. At runtime, they take the form of tuples, and only limited type reflection can be done on them. This sounds like Elixir records are pretty much identical to Erlang records, but paradoxically, it actually means the two are separate and distinct. You can “reproduce” an Erlang record in Elixir enough to share data effectively between Erlang and Elixir code, but in each case, the record definition itself lives only within that language’s compiler, and cannot be shared with the other language.

One rather odd upshot of this separation between Elixir and Erlang records relates to the record_info/2 function we described earlier. It represents a special case in the compiler, one that provides information about Erlang records, but knows nothing about Elixir records. This means, effectively, you cannot use the function at all in Elixir. Elixir has no access to Erlang record information (which is contained within the Erlang compiler), and the function has no access to Elixir records.

# Define an Elixir record.
require Record
Record.defrecord :person, [:name, :phone, :address]

# The record_info/2 psuedo-function doesn't work with the Elixir record.
# This results in a compiler error: "record person undefined"
record_info(:fields, :person)

Furthermore, because it is a special case in the compiler, Elixir doesn’t let you define your own function called record_info/2 either.

# This function definition will not compile in Elixir.
# Oddly, the Elixir compiler will claim it is already defined.
def record_info(a, b), do: nil

Effectively, the name record_info/2 is completely unusable in Elixir because of the way Erlang special-cases it. That vinyl catalog app you were working on will have to use a different function name.

Interfacing with Erlang Records

If structs are generally preferred over records in Elixir, under what circumstances would you need to use Elixir records? The reference material lists two cases:

Working with short, internal data
Interfacing with Erlang records

Because records live only at compile time, they work best in Elixir when they are “internal” to a module, and represent small collections of fields that don’t need to be pattern matched or participate in protocol dispatch. Even in that case, the only real advantage of records is the performance and memory savings, and even that is often not significant.

However, if you interact with Erlang code that uses records in its public interface, then you may still find it convenient to interact with that data as records. For this case, Elixir provides useful helper functions for reproducing the appropriate Erlang record structure in Elixir. A record used by public Erlang functions will most likely be defined in an Erlang header (.hrl) file so that client Erlang modules can access the type definition. Elixir cannot compile Erlang headers directly (nor does it have a preprocessor to begin with) but it does provide a way to load Record definitions from an Erlang header so you can reproduce them in Elixir.

Here’s some example Elixir code that uses the Erlang HTTP client hackney. The :hackney_url.parse_url/1 function returns a record of type :hackney_url. In order to interpret this record from Elixir, we load the record definition from the Erlang header using the Record.extract/2 function. Then we can pass the result directly to defrecord to reproduce the Erlang record in Elixir:

# Load Elixir record tools at compile time.
require Record

# Define the hackney_url record, loading the definition from Erlang
Record.defrecord :hackney_url,
  Record.extract(:hackney_url, from_lib: "hackney/include/hackney_lib.hrl")

# Now we can interpret this record
def foo do
  url_data = :hackney_url.parse_url("http://www.google.com/")
  url_host = hackney_url(url_data, :host)
end

I don’t know of a way to move in the other direction, reproducing Elixir records in Erlang. However, we have already argued that records should generally not be used in Elixir interfaces anyway. Instead, use Elixir structs, which can be interpreted as maps from Erlang.

Types in Erlang and Elixir

Where we're going we don't need types

As a compile-time construct, records can be thought of as a typing mechanism, and this leads to our final topic. Like Ruby, Lisp, and a number of similar languages, Erlang and Elixir are both dynamically typed. Type checking is performed at runtime (often explicitly) via pattern matching. However, optional static type annotations are supported, and static type inference and checking can be performed using optional tools.

Type annotations between the two languages are nearly identical, modulo minor syntactic differences. Both support the same primitive and compound types, parameterized types, and union types. In fact, when I was writing conversions from Erlang to Elixir type annotations for Erl2ex, I found only a few special cases:

The “string” type in Erlang is actually a list of characters. To prevent confusion with Elixir strings, which are based on binaries, Elixir uses the type char_list() instead of “string” for Erlang strings. Actual Elixir strings use the type String.t(), which itself translates simply to binary().

Record fields can be annotated with their types in Erlang record definitions. Elixir does not support this syntax. Instead, Elixir typespecs use the record/2 pseudo-macro to define a record type.

% An Erlang record with types
-record(person, {name :: string(), phone :: string() | nil}).
-type type1() :: #person{}.

# The corresponding Elixir record type
require Record
Record.defrecord(:person, [name, :phone])
@typep type1() :: record(:person, name: char_list(), phone: char_list() | :nil)

Variables in specs need to have explicit type constraints in Elixir, whereas Erlang seems not to care. I’m not sure why this is. Here’s an example:

% Specifying an Erlang function that must return one of its arguments
-spec foo(A, B) -> A | B.

# The Elixir spec must explicitly provide types for variables. For example:
@spec foo(a, b) :: a | b when a: any(), b: any()

Beyond these items, I haven’t found significant differences in the type systems and annotations between the two languages, further solidifying their close relationship.

Running Dialyzer for Elixir

Of course, one of the chief benefits of having type annotations is the ability to run static type analysis. Erlang provides a tool called Dialyzer for this purpose (along with other related analyses such as dead code detection.) Since Elixir compiles to the same binary format, Dialyzer can be used to analyze Elixir code as well.

The one catch is that Dialyzer performs full program analysis. It needs access to information about all the program code, including Elixir’s standard library and runtime support. Typically, that part of the analysis is done in a preprocessing step, and the results are cached to disk in a “.plt” (Persistent Lookup Table) file. When you analyze an Elixir app, you reference the preprocessed data in that file.

All this can be done by passing switches to Dialyzer, but I find it convenient to use a tool called dialyxir which provides mix tasks that handle the details for you.

Interpreting Dialyzer output and understanding how best to employ optional static typing in Erlang and Elixir is a very large topic which I will not attempt to cover here. There are some techniques and gotchas specific to Elixir that I may try to blog about later.

Where to go from here

Next time, we’ll attempt an interesting project where we’ll put some of what we’ve seen in this series to practice. I won’t spoil it quite yet, but I think it’ll be compelling. Until then, feel free to browse the index of articles in this series, and stay tuned for more on Erlang and Elixir’s family ties.