Demystifying MLsub — the Simple Essence of Algebraic Subtyping

Note: this web article is an older version of a paper which has now been published as an ICFP Pearl. You can find a preprint of that paper here. It is probably better to read the paper rather than this post, as the paper contains more material, is more up to date, and is better explained!

Erratum: A previous version of this article contained a buggy implementation of extrude, which looped when extruding cycling bounds. This was already fixed in the published paper, and the fix has now been ported to the web article.


Algebraic subtyping is a new approach to global type inference in the presence of subtyping. It extends traditional Hindley-Milner type inference while preserving the principal type property — that is, it can always infer the most general type for any given expression. This approach was developed by Stephen Dolan as part of his PhD thesis, along with Alan Mycroft.

Algebraic subtyping was implemented in the MLsub type inference engine. However, the design of MLsub seems significantly more complex than the simple unification algorithms used for traditional ML languages. MLsub has proven harder to grasp, even for people already familiar with compilers and type systems, such as myself.

Dissatisfied with this state of affairs, I wanted to get to the bottom of the algebraic subtyping approach. What really is special about it, beyond the formalism? What are the simple concepts that hide behind the strange notions of biunification and polar types?

This article is an answer to those questions. I propose an alternative algorithm for algebraic subtyping, called Simple-sub. Simple-sub can be implemented efficiently in under 500 lines of code (including parsing, simplification, and pretty-printing), and I think it is much more familiar-looking and easier to understand than MLsub.

⇨ ⇨ ⇨ You can try Simple-sub online here! ⇦ ⇦ ⇦

This article is meant to be light in formalisms and easy to consume for prospective designers of new type systems and programming languages.

The Y Combinator Explained in Python

The Y combinator is a central concept in lambda calculus, which is the formal foundation of functional languages. Y allows one to define recursive functions without using self-referential definitions.

Most articles I’ve seen dedicated to explaining the Y combinator start by showing you the Y combinator, which in itself is fairly inscrutable, and then trying to explain why it works.

I think this is backwards. In this article, we’ll go the other way around: I’ll start by describing, in simple terms, the essence of the Y combinator — or how to make recursion work without recursive definitions, and from there I will derive the usual Y combinator.


What is Type Projection in Scala, and Why is it Unsound?

For a very long time, Scala has had a featured named type projection, which lets one refer to a type member defined in an existing type (i.e., to “project it” or “pick it out”). The syntax is T#A, where T is some type that is know to contain a type member named A.

Type projections was shown to be unsound, which means that it can be used to subvert the type system and cause crashes at runtime. The feature is consequently being removed from Scala 3, the upcoming major version of the Scala language.

In this article, I describe how type projection works with several examples, and explain why it is unsound.

This post is meant to be understandable by anyone with basic knowledge of typed programming languages.

Scala Pattern Matching Warts and Improvements

Pattern matching is a central feature in most functional programming languages. Indeed, it is the main tool for expressing data flow, and the core of Functional Programming is thinking about data-flow rather than control-flow, as explained in Haoyi’s excellent blog post.

Scala offers an interesting and powerful take on pattern matching: in Scala, pattern matching is not fundamentally based on algebraic data types (ADT); rather, it is defined in a way that is compatible with Scala’s encoding of ADTs but is also more general. This allows users to abstract pattern definitions, and to define custom “extractor” objects.

Yet, there are many ways in which pattern matching in Scala is currently lacking, or even plainly deficient, compared to other languages. I made a list of such deficiencies below, in the hope that they will eventually be fixed. I separated them into two categories:

Comprehending Monoids with Class

This is a partial transcription of an extended-abstract talk I gave at the TyDe 2018 Type-Driven Development workshop, colocated with ICFP.

The main idea was to revisit an old concept — monoid comprehensions — and to explore what it gives us in the context of a functional programming language with support for type classes, as well as how it compares to the more traditional monad comprehension approach.

In summary, we argue that for expressing queries over collections of data, monoid comprehension can be more flexible, simpler, more efficient, and safer than its monadic counterpart. I substantiate these points in the following blog post, and recapitulate them in the conclusion.

In a future follow-up to this post, I will transcribe the rest of my TyDe talk. We will see how to embed and extend the original monoid comprehension calculus (MCC), using the host language’s static type system to enforce some properties of monoid queries at compile time.

A Dual to Iterator, and How to Abuse For Comprehension

Have you ever wondered why there is no efficient and convenient way of merging unrelated collection traversals in Scala’s standard library? – such as computing the average of a list as its sum divided by its count, but in one pass…

In this post, I show how to abuse for comprehension and leverage the following syntax to define aggregators, the dual to iterators:

def average = for { s <- sum[Double]; c <- count } yield s/c
val ls = List(1.0, 2.0, 3.0)
assert(average(ls) == 2.0)

// `average` can then be composed within bigger loops modularly:
def avgmax = for {
  avg <- average
  m   <- max[Double]
} yield (avg,m)
// this does a single traversal of `ls`:
assert(avgmax(ls) == (2.0, Some(3.0)))

Throwing Other Things than Exceptions, Continued

Yesterday, I demonstrated how abusing the exception facilities of Java could allow us to conveniently encode tagged union types.

Here is a C++11 program demonstrating another use of exceptions for quickly breaking out of nested closure calls.

Algebraic Data Types in Java 6

Motivation

Java is an opinionated programming language, in that it forces users to embrace Object Oriented Programming. Even after the introduction of lambdas in Java 8, it still suffers from a lack of abstractions commonly found in functional programming languages such as Scala and OCaml.

One of these abstractions is Algebraic Data Types – in particular tagged unions (also called sum types), which allow the definition of types that support several possible representations. Tagged unions are similar to Java enums, the only difference being that each of the cases can embed its own data.

Algebraic data types are usually associated with pattern-matching, another feature that programmers used to the functional style miss in Java. However, in this post I will only focus on describing a nice and simple trick that allows us to encode tagged unions in Java (working with versions of Java as old as Java 6 – perhaps even older), supporting exhaustiveness checks and meta cases, but without support for pattern matching.1

  1. Since the introduction of lambda-expressions in Java 8, it has become possible to encode some forms of pattern matching, such as in this blog post