𝐖ell-𝚻yped 𝐑eflections

Pattern matching is a central feature in most functional programming languages. Indeed, it is the main tool for expressing data flow, and the core of Functional Programming is thinking about data-flow rather than control-flow, as explained in Haoyi’s excellent blog post.

Scala offers an interesting and powerful take on pattern matching: in Scala, pattern matching is not fundamentally based on algebraic data types (ADT); rather, it is defined in a way that is compatible with Scala’s encoding of ADTs but is also more general. This allows users to abstract pattern definitions, and to define custom “extractor” objects.

Yet, there are many ways in which pattern matching in Scala is currently lacking, or even plainly deficient, compared to other languages. I made a list of such deficiencies below, in the hope that they will eventually be fixed. I separated them into two categories:

pattern matching warts, which are clearly unjustified limitations and quirks; and
possible improvements to pattern matching, which would make pattern matching in Scala even more useful and powerful.

Jump directly to the second part if you’re more interested in pattern matching features and improvements, and not so much in Scala’s obscure warts.

Pattern Matching Warts

• Type arguments cannot be passed explicitly in patterns

Scala has a powerful form of local bidirectional type inference. Yet, the limitations of type inference (some of which due to fundamental undecidability) mean that one will sometimes have to provide type arguments explicitly. Thankfully, Scala (unlike standard Haskell) has a convenient syntax for passing type arguments explicitly, in square brackets.

Everywhere a type argument is expected in Scala, it is possible to provide it explicitly. The one big, ugly exception is in patterns. One cannot pass type arguments to unapply methods in a pattern, as in:

  case Foo[Int](a,b,c) => ...

This missing capability is sometimes needed, especially in the presence of GADTs and Scala pattern matching limitations, or if the unapply method takes implicits that cannot be inferred given the particular subpatterns used in the match. For a concrete example, see this place where an ugly workaround was used; essentially:

  case foo: Foo[Int] => foo match { case Foo(a,b,c) => ... }

I seem to remember seeing an issue about that wart somewhere, so it’s already something that has been discussed. I hope that it will soon be fixed.

• Implicit arguments cannot be passed explicitly in patterns

This is pretty much the same as the above but for implicits. Everywhere an implicit is expected in Scala, it is possible to provide it explicitly. The exception again is in patterns. So while we’re at it, why not also allow passing implicits explicitly in patterns?

This feature is admittedly a bit more controversial, because it could look confusing to see patterns where information flows out of an argument list, and into another argument list:

  case Foo[Int](a,b,c)(myImplicit) =>  // myImplicit passed as argument, not extracted!

However, if the new proposed syntax for implicits is to be adopted, it could look way more natural and less surprising:

  case Foo[Int](a,b,c) with (myImplicit) =>  // myImplicit clearly an implicit argument

• Type binding in patterns is wildly inconsistent

A Scala feature that is not very well-known is that types can be extracted from patterns, not just values. This is legal and works as expected:

  case foo: Foo[t] => ...
  case (a: Foo[s], b: Foo[t]) => ...

However, this capability is very inconsistently implemented. For example, in the following, types s and t are not interpreted as type pattern variables, but as references to existing types! This is completely inconsistent with the way value pattern variables are treated, and therefore very surprising:

  case (a: s, b: t) =>  // not found: type s; not found: type t

Instead, one would have to write the more cumbersome:

  case Tuple2[s,t](a, b) => ...

…that is, assuming the previous wart was corrected! In current Scala, one has to nest pattern matching expressions, which is very cumbersome and unsatisfactory:

  case t: Tuple2[s,t] => t match { case (a, b) => ... }

An alternative that is almost as ugly is to define a type Id[A] = A and then use:

case (a: Id[s], b: Id[t])

But wait, there’s more! Type bindings have some other tricks to deliver. Assuming a case class Foo[T](x:T), all the following patterns fail in different, incomprehensible ways when used on the left-hand side of a val, while they all work in a full pattern match form:

val _: Foo[t] = Foo(1)
// ^ not found: type t

okay, scalac doesn’t see that it’s a pattern (it thinks it’s a val with a type)

val (_: Foo[t]) = Foo(1)
// ^ type mismatch; found: Foo[t(in value _)]; required: Foo[t(in value _)]
//   type mismatch; found: Int(1); required: t

what?

val Foo(_: Foo[t]) = Foo(Foo(1))
// ^ type t is not a value

huh?

val Foo(a: Foo[t]) = Foo(Foo(1))

it works!!(?!)
now let’s use the extracted t:

Option.empty[t]
// ^ not found: type t

…of course, the extracted t is not visible… that would be too easy!

This is nuts. And it leads us to the other warts about val patterns…

• Value definitions with patterns are inconsistent

The first problem is well-known and already exemplified above: val n: Int = m is a value definition while val (n: Int) = m is a pattern matching. Though it is slightly annoying conceptually, I can live with that inconsistency.

But there’s also that using pattern-matching on the left-hand side of a val never gives any warnings when your pattern is not exhaustive and may fail, even when the same pattern would yield a warning in a normal match expression. I consider that a wart, because it’s an arbitrary difference between two syntactic forms that should be equivalent. If I really want to risk getting a MatchError when defining a val, I should be required to make that explicit with an @unchecked annotation, the same as for match expressions.

This would also solve the classic Scala puzzler below, which compiles without warnings and crashes at runtime with a MatchError:

val 0 = 1

• Local value definitions with patterns are needlessly inefficient

Let us see how the following is desugared:

val (a,b) = (0,1).swap
println(a + b)
// ...

In an ideal world, we would get the following straightforward desugaring:

(0,1).swap match {
  case (a,b) =>
    println(a + b)
    // ...
}

…but that would be too easy again! Here is basically how Scala currently does it:

val x$1 = (0, 1).swap match {
  case (a, b) => (a, b)
}
val a = x$1._1
val b = x$1._2
Predef.println(a + b)
// ...

Yep, Scala creates an entirely useless tuple just for the purpose of taking its components apart on the next line. It may be that the intermediate tuple will be optimized away anyway, but there is no guarantee of that for more complex patterns. It also gives needless optimization work to the compiler, as if the Scala compiler could afford spending extra time fixing desugaring blunders.

There actually is a good reason Scala does that in general: our alternative scheme is not as general, and wouldn’t work for object fields (where the val is not just a local binding), or for lazy val bindings.

Still, there is no excuse for not employing a more streamlined and efficient desugaring for local val bindings, in my opinion.

• A case with a single pattern variable is not the same as a lambda

This one is super confusing even for experienced Scala developers, though it does not . I don’t have a concrete example at hand, but the gist is that these two forms have very slight semantic differences, and will sometimes not be typed the same:

foo { case a => ... }
foo {      a => ... }

Variable a in the alternative with the case will sometimes receive a more precise type, for some reason.

• ClassTag-based type matching is unsound

Did you know that Scala had reified generics? – no, not the kind that .NET has. Scala’s reified generics do not help with performance, they are just used to make types influence runtime semantics in a more principled way… or at least, in theory.

First things first, here is an erroneous pattern matching in Scala, meant to count all the instances of some type S in a list:

scala> def first[S](ls: List[Any]): Option[S] = ls.collectFirst{ case s: S => s }
// ^ warning: abstract type pattern S is unchecked since it is eliminated by erasure

scala> first[String](List(1,"a",2,"b"))
res1: Option[String] = Some(1)  // wrong!

This does not actually work and raises a warning (so it’s okay). It will always collect the first element, because type parameter S is erased and does not exist at runtime, so it cannot be checked. Here is how to reify S and make the definition seemingly work:

scala> import scala.reflect.ClassTag

scala> def first[S:ClassTag](ls: List[Any]): Option[S] = ls.collectFirst{ case s: S => s }

scala> first[String](List(1,"a",2,"b"))
res2: Option[String] = Some(a)  // correct

It does not yield a warning and returns the correct result in this case. This is all well, working as designed. However, a ClassTag[T] only contains a representation of the runtime class of a type T, not its full type. So our definition does not actually work and leads to unsoundness in the type system:

scala> first[List[Int]](List(List("oops"))).get.head + 1
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
  at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:101)
  ... 28 elided

The code above should at least raise a warning somewhere. As it stands, ClassTag evidence just completely bypass the type system and lead to unsoundness.

We could either raise a warning in the definition of first or, perhaps more usefully, at the call site: we could make sure to only infer ClassTag[T] evidence when the the inferred class fully describes type T, so that first[List[Int]] would be illegal but first[List[_]] would be legal.

For the record, we can get full type reification in Scala using the much more heavyweight/expensive TypeTag[T] type class (not recommended as it basically pulls Scala’s entire type system into your runtime), but such evidence cannot be used for pattern matching use cases such as the one above.

• GADT pattern matching is thoroughly broken

Pattern matching on GADT just does not really work in Scala, currently. It is very easy to come up with unsound cases.

Thankfully, this is currently being worked on in Dotty. See for example this issue.

• Partial functions duplicate code!

When you define a partial function, such as in ls.collect { case Foo(x) if bar(x) => ... }, Scala will actually duplicate the code of the pattern, generating something like:

ls.collect {
  final class $anonfun extends  {
    final override def applyOrElse[A1, B1](x1: A1, default: A1 => B1) = x1 match {
      case Foo(x) if bar(x) => ...   // first occurrence of duplicated pattern
      case _ => default.apply(x1)
    }
    final def isDefinedAt(x1) = x1 match {
      case Foo(x) if bar(x) => true  // second occurrence of duplicated pattern
      case _ => false
    }
  }
  new $anonfun()
}

This is probably fine when patterns are simple, though I have a hard time justifying the bytecode duplication (and corresponding additional load on the compiler backend) when we could just put the pattern code in an auxiliary final function.

However, this becomes problematic when patterns are macros that expand to potentially a lot of code!

For example, consider the Squid pattern matching code below (Squid is a Scala quasiquote-based type-safe metaprogramming framework):

  case code"Option.empty[$t]" => ...

which corresponds to the following generated code for the pattern:

  val __b__ : TestDSL.Predef.base.type = TestDSL.Predef.base;
  {
    final class $anon {
      type $ExtractedType$ = Option[t]
      type $ExtractedContext$ = Any
      def unapply(_t_ : __b__.SomeCode): Option[TestDSL.Predef.base.CodeType[Typ]] = {
        val _term_ = __b__.wrapExtract(__b__.cacheRep("b1a0d86c-0aca-4eb5-8565-390b86a65fa8", (), ((x$35: Unit) => {
          val _0_OptionSym = __b__.`scala.Option`
          val _1_Option = __b__.staticTypeApp(_0_OptionSym.tsym, scala.List(__b__.typeHole("t")))
          val _2_Option = __b__.staticModule("scala.Option")
          val _3_Option_Sym = __b__.loadTypSymbol("scala.Option$")
          val _4_empty = __b__.loadMtdSymbol(_3_Option_Sym, "empty", scala.None, false)
          __b__.methodApp(_2_Option, _4_empty, scala.List(__b__.typeHole("t")), scala.Nil, _1_Option)
        })))
        __b__.extractRep(_term_, _t_.rep).map(((_maps_0_ ) => {
          val _maps_ = __b__.`internal checkExtract`("""  case code"Option.empty[$t]" => ...
  ^""", _maps_0_)()("t")()
          __b__.`internal CodeType`[Typ](_maps_._2("t"))
        }))
      }
    }
    new $anon()
  }

Duplicating that does not seem like a good idea.

Proposals for Pattern Matching Improvements

• Pattern Variables in Alternatives

It should be possible to bind pattern variable in the branches of pattern alternatives, just like in OCaml.

For example, I want to be able to write:

  case Left(a) | Right(a) => ...

The type given to a in the right-hand side of the case, assuming we are matching an Either[S,T], should obviously be the least upper bound of S and T. In Dotty, that corresponds to S | T. For instance:

def get[S,T]: Either[S,T] => S | T = { case Left(a) | Right(a) => a }

This is a seemingly trivial change, but it would make so many uses of pattern matching in Scala more concise! For one thing, instead of the awkward:

  try ... catch { case er @ (_: E | _: F) => ... }

we could write:

  try ... catch { case er: E | er: F => ... }

but this also really translates to more expressive power. Consider the following OCaml pattern matching code, which I wrote in my last project:

let rec normalize = function
| TUnion(TBot,t) | TUnion(t,TBot)
| TInter(t,TTop) | TInter(TTop,t)
| TUnion((TTop as t),_) | TUnion(_,(TTop as t))
| TInter(_,(TBot as t)) | TInter((TBot as t),_)
(* the above is a single pattern! *)
-> normalize t (* and here's the right-hand side of the match *)
...

In order to write that in Scala, one would have to split the pattern into many cases, and duplicate the right-hand side. When the right-hand side is non-trivial, this can become a problem. Instead, I just want to be able to write:

  case TUnion(TBot,t) | TUnion(t,TBot)
    | TInter(t,TTop) | TInter(TTop,t)
    | TUnion((t @ TTop),_) | TUnion(_,(t @ TTop))
    | TInter(_,(t @ TBot)) | TInter((t @ TBot),_)
    => normalize(t)

As a generalization of this, once we have type-enforced null-safety, we could also say that if a pattern variable appears in only one of two pattern alternatives (something that is illegal in OCaml), in Dotty we would give it type T | Null. Or even something like T | Unit could already be useful, in the absence of null-safety.

• Generalized Pattern Binding

In Scala and Haskell, @ is used to bind a name to a subpattern. However, I don’t see a reason why it couldn’t be used to bind two or more arbitrary subpatterns:

  case FirstPattern(a) @ SecondPattern(b, c) =>  // ... use a, b, c

For example, consider hypothetical character extractors StartingWith and EndingWith for String, which could then be used as:

def isPalindrom: String => String = {
  case "" => true
  case str @ StartingWith(a) @ EndingWith(b) => a == b && isPalindrom(str.tail.init)
}

(Of course, the above is inefficient because it destructures a string inductively, incurring lots of copying. We could make it more efficient using some kind of StringSlice or StringView type, though.)

• Partial Destructuring in Guards

Preamble: boolean blindness of conditional guards.

Have you ever seen or written this pattern?

  case Var(name)
    if boundVariables.contains(name)
    => foo(boundVariables(value))

This is terrible: it queries the boundVariables map twice: once to find out whether the name key is in it, and another to actually access it. In addition, this suffers from “boolean blindness”: the type system does not track whether you made the contains check before doing the access, and a refactoring could easily break that invariant silently.

There is a better function for querying maps in Scala called get, which returns an option in one go. But that method cannot be used here, without refactoring the pattern match expression in a disruptive way: we would have to move the conditional branching out of the conditional guard and into the right-hand side of the case, which means we’d have to modify the other cases in the match too.

A typical example is when calling collect as below:

exprs.collect {
  case Var(name) if bv contains name => Left(bv(name))
  case Const(v) => Rigth(v)
}

which can’t be easily adapted to use get on the map, instead requiring a complete refactoring:

exprs.flatMap { // can't use collect anymore!
  case Var(name) => bv.get(name).map(Left.apply)
  case Const(v)  => Some(Rigth(v))
}

Things would get even messier if we wanted to test something on the extracted value in the guard:

  case Var(name)
    if boundVariables.get(name).exists(value => value > 0)
    => foo(boundVariables(value))

This is just an instance of the general problem that conditional guards in Scala suffer from boolean blindness.

Solution: destructuring pattern guards.

We should allow several conditional guards headed by if, taking either a boolean expression or a destructuring statement with an = sign. For example:

  case Var(name)
    if Some(value) = boundVariables.get(name)
    if value > 0
    => foo(value)

There is no simple way that I know of doing something like the above in current Scala, aside from defining a custom extractor, which is very verbose…

This is very similar to the pattern guards that Haskell added in 2010. It’s also related to Rust’s more restricted if let idiom.

Similarity and unification with partial matching in comprehensions.

There has been a lot of talk about separating total from partial destructuring in for comprehensions. One of the proposed syntax was:

for { x <- eithers; if Left(y) = x; if Right(z) = y } yield z
// equivalent to:
for { if Left(Right(z)) <- eithers } yield z

…instead of the current Scala code below, which is possibly surprising because it hides the fact that the destructuring is partial and actually performs filtering:

for { Left(Right(z)) <- eithers } yield z // filters
// NOT equivalent to:
for { x <- eithers; Left(y) = x; Right(z) = y } yield z // fails with MatchError!

I like the proposed if syntax, and suggest to generalize it to pattern guards.

• Interoperable partial functions, option-returning functions, and extractors

The Extractor.scala project should probably be part of the standard library, as it seems to be something pretty useful and fundamental. It allows converting among A => Option[B], PartialFunction[A, B], and extractor objects. This would be especially useful since the syntax of the latter is so verbose.

There is actually already a SIP for that one!

Note: the following proposals are more “food for thought” than serious proposals, and I did not take time to write good motivations for them…

• Allow information to flow into patterns explicitly

In the previous section, I argued that not being able to spell out implicit arguments in patterns was a wart. Here I’m going to propose something potentially even more controversial: allow normal argument lists in patterns, where arguments flow into the pattern to influence it.

The idea is that, if we’re going to be able to pass more argument lists to a pattern for the implicits, why not allow non-implicit argument lists? We would consider that values flow into the pattern for all argument lists after the first one.

This’d be useful when we want to configure the semantics of a pattern. Weirdly, there is already an instance of information flowing into patterns in current Scala, which is due to string interpolators! Consider:

  case foo"... $a ... ${Left(b)} ..." =>

…which is is equivalent to:

  case scala.StringContext("... ", " ... ", " ...").foo(a,Left(b)) =>

…but the expanded form above is not valid syntax! Why not allow it too, while we’re at it?

This is probably a long shot and would likely be non-trivial to implement, but I’m putting it out there because it would be nice.

Here is an immediate use case, to extract a variable and compare it with another subpattern, as an alternative to extracting two variables and comparing them in a guard:

object equalTo {
  def unapply[A](scrut: A)(compared: A): Boolean =
    scrut == compared
}
// f matches pairs containing the same Int element:
val f: (Int,Int) => Option[Int] = {
  case (n, equalTo()(n)) => Some(n)
  case _ => None
}

• Pattern Blocks

Scala is an expression-oriented language: we allow blocks in expression position, which can be useful to introduce temporary bindings – for example, instead of foo(bar(1) + bar(1)) we can write foo({val b = bar(1); b+b}).

Why not allow blocks in patterns? It could be used, similarly, to introduce temporary bindings. A block in pattern position would return a pattern, instead of an expression.

  case {
    val N = ... // expensive computation
    (0,N)       // the result of this block is the (0,N) pattern
  } => true
  case (1,N)    // N is still visible here!
    => true

It is worth noting that internally, the Scala compiler already represents patterns in a pretty similar way.

Programming (8)

Well-Typed Reflections

Recent Posts

Links

Scala Pattern Matching Warts and Improvements