OCaml Labs compiler hacking


Highlights from recent sessions

24 Jun 2014

Highlights from recent sessions

With the next compiler hacking meeting due to take place in a couple of days it's time for a look back at some results from our last couple of sessions.

The front end

Camel front end
(today I stared a camel in the face by Adam Foster)

The front end (i.e. the parser and type checker) saw a number of enhancements.

Succinct functor syntax

Syntax tweaks are always popular, if often contentious. However, reaching agreement is significantly easier when adding syntax is a simple matter of extending an existing correspondence between two parts of the language. For example, it was clear which syntax to use when adding support for lazy patterns: since patterns generally mirror the syntax for the values they match, patterns for destructing lazy values should use the same lazy keyword as the expressions which construct them.

A second correspondence in OCaml's syntax relates modules and values. Module names and variables are both bound with =; module signatures and types are both ascribed with :; module fields and record fields are both projected with .. The syntax for functors and functions is also similar, but the latter offers a number of shortcuts not available in the module language; you can write

fun x y z -> e

instead of the more prolix equivalent:

fun x -> fun y -> fun z -> e

but multi-argument functors must be written out in full:

functor (X : R) -> functor (Y : S) -> functor (Z : T) -> M

In February's meeting, Thomas wrote a patch that adds an analogue of the shorter syntax to the module language, allowing the repeated functor to be left out:

functor (X : R) (Y : S) (Z : T) -> M

The patch also adds support for a corresponding abbreviation at the module type level. Defining the type of a multi-argument functor currently involves writing a rather clunky sequence of functor abstractions:

module type F = functor (X : R) -> functor (Y : S) -> functor (Z : T) -> U

With Thomas's patch all but the first occurrence of functor disappear:

module type F = functor (X : R) (Y : S) (Z : T) -> U

Since Thomas's patch has been merged into trunk, you can try out the new syntax using the 4.02.0 beta, which is available as a compiler switch in the OPAM repository:

opam switch 4.02.0+trunk

The next step is to find out whether the verbose syntax was a symptom or a cause of the infrequency of higher-order functors in OCaml code. Will we see a surge in the popularity of higher-order modules as the syntax becomes more accommodating?

Integer ranges

David started work on extending OCaml's range patterns, which currently support only characters, to support integer ranges. For example, consider the following code from MLDonkey:

match mdn with
  None       when h >= 0 && h <= 23 -> Some h
| Some false when h > 0 && h <= 11  -> Some h
| Some false when h = 12            -> Some 0
| Some true  when h > 0 && h <= 11  -> Some (h + 12)
| Some true  when h = 12            -> Some 12
| Some _                            -> None
| None                              -> None

Although this is fairly clear, it could be made even clearer if we had a less powerful language for expressing the tests involving h. Since the whole OCaml language is available in the when guard of a case, the reader has to examine the code carefully before concluding that the tests are all simple range checks. Perhaps worse, using guards inhibits the useful checks that the OCaml compiler performs to determine whether patterns are exhaustive or redundant. David's patch makes it possible to rewrite the tests without guards, making the simple nature of the tests on h clear at a glance (and making it possible once again to check exhaustiveness and redundancy):

match mdn, h with
  None      , 0..23
| Some false, 1..11 -> Some h
| Some false, 12    -> Some 0
| Some true , 1..11 -> Some (h + 12)
| Some true , 12    -> Some 12
| _                 -> None

The work on range patterns led to a robust exchange of views about which other types should be supported -- should we support any enumerable type (e.g. variants with nullary constructors)? or perhaps even any ordered type (e.g. floats or strings)? For the moment, there seems to be a much clearer consensus in favour of supporting integer types than there is for generalising range patterns any further.

Extensible variants

Since the compiler hacking group only meets for an evening every couple of months or so, most of the projects we work on are designed so that it's possible to implement them in a few hours. Leo's proposal for extensible variants is a notable exception, predating both the compiler hacking group and OCaml Labs itself.

Extensible variants generalise exceptions: with Leo's patch the exception type exn becomes a particular instance of a class of types that can be defined by the user rather than a special builtin provided by the compiler:

(* Define an extensible variant type *)
type exn = ..

(* Extend the type with a constructor *)
type exn += Not_found

(* Extend the type with another constructor *)
type exn += Invalid_argument of string

Even better, extensible variants come with all the power of regular variant types: they can take type parameters, and even support GADT definitions:

(* Define a parameterised extensible variant type *)
type 'a error = ..

(* Extend the type with a constructor *)
type 'a error = Error of 'a

(* Extend the type with a GADT constructor *)
type 'a error : IntError : int -> int error

On the evening of the last compiler hacking meeting, Leo completed the patch; shortly afterwards it was merged to trunk, ready for inclusion in OCaml 4.02!

Extensible variants are a significant addition to the language, and there's more to them than these simple examples show. A forthcoming post from Leo will describe the new feature in more detail. In the meantime, since they've been merged into the 4.02 release candidate, you can try them out with OPAM:

opam switch 4.02.0+trunk

Lazy record fields

Not everything we work on makes is destined to make it upstream. A few years ago, Alain Frisch described an OCaml extension in use at Lexifi for marking record fields lazy, making it possible to delay the evaluation of initializing expressions without writing the lazy keyword every time a record is constructed. Alain's post was received enthusiastically, and lazy record fields seemed like an obvious candidate for inclusion upstream, so in April's meeting Thomas put together a patch implementing the design. Although the OCaml team decided not to merge the patch, it led to an enlightening discussion with comments from several core developers, including Alain, who described subsequent, less positive, experience with the feature at Lexifi, and Xavier, who explained the rationale underlying the current design.

The back end

Camel back end
(Relief by Hartwig HKD)

The OCaml back end (i.e. the code generation portion of the compiler) also saw a proposed enhancement.

Constant arithmetic optimization

Stephen submitted a patch improving the generated code for functions that perform constant arithmetic on integers.

In OCaml, integers and characters are represented as shifted immediate values, with the least significant bit set to distinguish them from pointers. This makes some arithmetic operations a little more expensive. For example, consider a function that int_of_digits that builds an integer from three character digits:

int_of_digits '3' '4' '5' => 345

We might define int_of_digits as follows:

let int_of_digits a b c = 
  100 * (Char.code a - Char.code '0') + 
   10 * (Char.code b - Char.code '0') +
    1 * (Char.code c - Char.code '0')

Passing the -dcmm flag to ocamlopt shows the results of compiling the function to the C-- intermediate language.

ocamlopt -dcmm int_of_digits.ml

The generated code has the following form (reformatted for readability):

200 * ((a - 96) >> 1) +
 20 * ((b - 96) >> 1) +
  2 * ((c - 96) >> 1) + 1

The right shifts convert the tagged representation into native integers, and the final + 1 converts the result back to a tagged integer.

Stephen's patch floats the arithmetic operations that involve constant operands outwards, eliminating most of the tag-munging code in favour of a final correcting addition:

(a * 100) +
(b * 10) +
 c - 10766

Although these changes are not yet merged, you can easily try them out, thanks to Anil's script that makes compiler pull requests available as OPAM switches:

opam switch 4.03.0+pr17

Standard library and beyond

Camel library
(Literary advocate Dashdondog Jamba, and his mobile library, described in My librarian is a camel)

Our compiler hacking group defines "compiler" rather broadly. As a result people often work on improving the standard library and tools as well as the compiler proper. For example, in recent sessions, David added a small patch to expose the is_inet6_addr function, and Philippe proposed a patch that eliminates unnecessary bounds checking in the buffer module. The last session also saw Raphaƫl and Simon push a number of patches for integrating merlin with the acme editor to OPAM, improving OCaml support in Plan 9.

Next session

The compiler hacking group is open to anyone with an interest in contributing to the OCaml compiler. If you're local to Cambridge, you're welcome to join us at the next session!