Pattern matching is a central feature in most functional programming languages. Indeed, it is the main tool for expressing data flow, and the core of Functional Programming is thinking about data-flow rather than control-flow, as explained in Haoyiās excellent blog post.
Scala offers an interesting and powerful take on pattern matching: in Scala, pattern matching is not fundamentally based on algebraic data types (ADT); rather, it is defined in a way that is compatible with Scalaās encoding of ADTs but is also more general. This allows users to abstract pattern definitions, and to define custom āextractorā objects.
Yet, there are many ways in which pattern matching in Scala is currently lacking, or even plainly deficient, compared to other languages. I made a list of such deficiencies below, in the hope that they will eventually be fixed. I separated them into two categories:
-
pattern matching warts, which are clearly unjustified limitations and quirks; and
-
possible improvements to pattern matching, which would make pattern matching in Scala even more useful and powerful.
Jump directly to the second part if youāre more interested in pattern matching features and improvements, and not so much in Scalaās obscure warts.
Pattern Matching Warts
ā¢ Type arguments cannot be passed explicitly in patterns
Scala has a powerful form of local bidirectional type inference. Yet, the limitations of type inference (some of which due to fundamental undecidability) mean that one will sometimes have to provide type arguments explicitly. Thankfully, Scala (unlike standard Haskell) has a convenient syntax for passing type arguments explicitly, in square brackets.
Everywhere a type argument is expected in Scala, it is possible to provide it explicitly. The one big, ugly exception is in patterns. One cannot pass type arguments to unapply
methods in a pattern, as in:
case Foo[Int](a,b,c) => ...
This missing capability is sometimes needed, especially in the presence of GADTs and Scala pattern matching limitations, or if the unapply
method takes implicits that cannot be inferred given the particular subpatterns used in the match. For a concrete example, see this place where an ugly workaround was used; essentially:
case foo: Foo[Int] => foo match { case Foo(a,b,c) => ... }
I seem to remember seeing an issue about that wart somewhere, so itās already something that has been discussed. I hope that it will soon be fixed.
ā¢ Implicit arguments cannot be passed explicitly in patterns
This is pretty much the same as the above but for implicits. Everywhere an implicit is expected in Scala, it is possible to provide it explicitly. The exception again is in patterns. So while weāre at it, why not also allow passing implicits explicitly in patterns?
This feature is admittedly a bit more controversial, because it could look confusing to see patterns where information flows out of an argument list, and into another argument list:
case Foo[Int](a,b,c)(myImplicit) => // myImplicit passed as argument, not extracted!
However, if the new proposed syntax for implicits is to be adopted, it could look way more natural and less surprising:
case Foo[Int](a,b,c) with (myImplicit) => // myImplicit clearly an implicit argument
ā¢ Type binding in patterns is wildly inconsistent
A Scala feature that is not very well-known is that types can be extracted from patterns, not just values. This is legal and works as expected:
case foo: Foo[t] => ...
case (a: Foo[s], b: Foo[t]) => ...
However, this capability is very inconsistently implemented. For example, in the following, types s
and t
are not interpreted as type pattern variables, but as references to existing types! This is completely inconsistent with the way value pattern variables are treated, and therefore very surprising:
case (a: s, b: t) => // not found: type s; not found: type t
Instead, one would have to write the more cumbersome:
case Tuple2[s,t](a, b) => ...
ā¦that is, assuming the previous wart was corrected! In current Scala, one has to nest pattern matching expressions, which is very cumbersome and unsatisfactory:
case t: Tuple2[s,t] => t match { case (a, b) => ... }
An alternative that is almost as ugly is to define a type Id[A] = A
and then use:
case (a: Id[s], b: Id[t])
But wait, thereās more! Type bindings have some other tricks to deliver. Assuming a case class Foo[T](x:T)
, all the following patterns fail in different, incomprehensible ways when used on the left-hand side of a val
, while they all work in a full pattern match form:
val _: Foo[t] = Foo(1)
// ^ not found: type t
okay, scalac doesnāt see that itās a pattern (it thinks itās a
val
with a type)
val (_: Foo[t]) = Foo(1)
// ^ type mismatch; found: Foo[t(in value _)]; required: Foo[t(in value _)]
// type mismatch; found: Int(1); required: t
what?
val Foo(_: Foo[t]) = Foo(Foo(1))
// ^ type t is not a value
huh?
val Foo(a: Foo[t]) = Foo(Foo(1))
it works!!(?!)
now letās use the extractedt
:
Option.empty[t]
// ^ not found: type t
ā¦of course, the extracted
t
is not visibleā¦ that would be too easy!
This is nuts. And it leads us to the other warts about val
patternsā¦
ā¢ Value definitions with patterns are inconsistent
The first problem is well-known and already exemplified above: val n: Int = m
is a value definition while val (n: Int) = m
is a pattern matching. Though it is slightly annoying conceptually, I can live with that inconsistency.
But thereās also that using pattern-matching on the left-hand side of a val
never gives any warnings when your pattern is not exhaustive and may fail, even when the same pattern would yield a warning in a normal match
expression. I consider that a wart, because itās an arbitrary difference between two syntactic forms that should be equivalent. If I really want to risk getting a MatchError
when defining a val
, I should be required to make that explicit with an @unchecked
annotation, the same as for match
expressions.
This would also solve the classic Scala puzzler below, which compiles without warnings and crashes at runtime with a MatchError
:
val 0 = 1
See also this Dotty issue.
ā¢ Local value definitions with patterns are needlessly inefficient
Let us see how the following is desugared:
val (a,b) = (0,1).swap
println(a + b)
// ...
In an ideal world, we would get the following straightforward desugaring:
(0,1).swap match {
case (a,b) =>
println(a + b)
// ...
}
ā¦but that would be too easy again! Here is basically how Scala currently does it:
val x$1 = (0, 1).swap match {
case (a, b) => (a, b)
}
val a = x$1._1
val b = x$1._2
Predef.println(a + b)
// ...
Yep, Scala creates an entirely useless tuple just for the purpose of taking its components apart on the next line. It may be that the intermediate tuple will be optimized away anyway, but there is no guarantee of that for more complex patterns. It also gives needless optimization work to the compiler, as if the Scala compiler could afford spending extra time fixing desugaring blunders.
There actually is a good reason Scala does that in general: our alternative scheme is not as general, and wouldnāt work for object fields (where the val
is not just a local binding), or for lazy val
bindings.
Still, there is no excuse for not employing a more streamlined and efficient desugaring for local val
bindings, in my opinion.
ā¢ A case with a single pattern variable is not the same as a lambda
This one is super confusing even for experienced Scala developers, though it does not . I donāt have a concrete example at hand, but the gist is that these two forms have very slight semantic differences, and will sometimes not be typed the same:
foo { case a => ... }
foo { a => ... }
Variable a
in the alternative with the case
will sometimes receive a more precise type, for some reason.
ā¢ ClassTag-based type matching is unsound
Did you know that Scala had reified generics? ā no, not the kind that .NET has. Scalaās reified generics do not help with performance, they are just used to make types influence runtime semantics in a more principled wayā¦ or at least, in theory.
First things first, here is an erroneous pattern matching in Scala, meant to count all the instances of some type S
in a list:
scala> def first[S](ls: List[Any]): Option[S] = ls.collectFirst{ case s: S => s }
// ^ warning: abstract type pattern S is unchecked since it is eliminated by erasure
scala> first[String](List(1,"a",2,"b"))
res1: Option[String] = Some(1) // wrong!
This does not actually work and raises a warning (so itās okay). It will always collect the first element, because type parameter S
is erased and does not exist at runtime, so it cannot be checked. Here is how to reify S
and make the definition seemingly work:
scala> import scala.reflect.ClassTag
scala> def first[S:ClassTag](ls: List[Any]): Option[S] = ls.collectFirst{ case s: S => s }
scala> first[String](List(1,"a",2,"b"))
res2: Option[String] = Some(a) // correct
It does not yield a warning and returns the correct result in this case. This is all well, working as designed. However, a ClassTag[T]
only contains a representation of the runtime class of a type T
, not its full type. So our definition does not actually work and leads to unsoundness in the type system:
scala> first[List[Int]](List(List("oops"))).get.head + 1
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:101)
... 28 elided
The code above should at least raise a warning somewhere. As it stands, ClassTag
evidence just completely bypass the type system and lead to unsoundness.
We could either raise a warning in the definition of first
or, perhaps more usefully, at the call site: we could make sure to only infer ClassTag[T]
evidence when the the inferred class fully describes type T
, so that first[List[Int]]
would be illegal but first[List[_]]
would be legal.
For the record, we can get full type reification in Scala using the much more heavyweight/expensive TypeTag[T]
type class (not recommended as it basically pulls Scalaās entire type system into your runtime), but such evidence cannot be used for pattern matching use cases such as the one above.
ā¢ GADT pattern matching is thoroughly broken
Pattern matching on GADT just does not really work in Scala, currently. It is very easy to come up with unsound cases.
Thankfully, this is currently being worked on in Dotty. See for example this issue.
ā¢ Partial functions duplicate code!
When you define a partial function, such as in ls.collect { case Foo(x) if bar(x) => ... }
, Scala will actually duplicate the code of the pattern, generating something like:
ls.collect {
final class $anonfun extends {
final override def applyOrElse[A1, B1](x1: A1, default: A1 => B1) = x1 match {
case Foo(x) if bar(x) => ... // first occurrence of duplicated pattern
case _ => default.apply(x1)
}
final def isDefinedAt(x1) = x1 match {
case Foo(x) if bar(x) => true // second occurrence of duplicated pattern
case _ => false
}
}
new $anonfun()
}
This is probably fine when patterns are simple, though I have a hard time justifying the bytecode duplication (and corresponding additional load on the compiler backend) when we could just put the pattern code in an auxiliary final function.
However, this becomes problematic when patterns are macros that expand to potentially a lot of code!
For example, consider the Squid pattern matching code below (Squid is a Scala quasiquote-based type-safe metaprogramming framework):
case code"Option.empty[$t]" => ...
which corresponds to the following generated code for the pattern:
val __b__ : TestDSL.Predef.base.type = TestDSL.Predef.base;
{
final class $anon {
type $ExtractedType$ = Option[t]
type $ExtractedContext$ = Any
def unapply(_t_ : __b__.SomeCode): Option[TestDSL.Predef.base.CodeType[Typ]] = {
val _term_ = __b__.wrapExtract(__b__.cacheRep("b1a0d86c-0aca-4eb5-8565-390b86a65fa8", (), ((x$35: Unit) => {
val _0_OptionSym = __b__.`scala.Option`
val _1_Option = __b__.staticTypeApp(_0_OptionSym.tsym, scala.List(__b__.typeHole("t")))
val _2_Option = __b__.staticModule("scala.Option")
val _3_Option_Sym = __b__.loadTypSymbol("scala.Option$")
val _4_empty = __b__.loadMtdSymbol(_3_Option_Sym, "empty", scala.None, false)
__b__.methodApp(_2_Option, _4_empty, scala.List(__b__.typeHole("t")), scala.Nil, _1_Option)
})))
__b__.extractRep(_term_, _t_.rep).map(((_maps_0_ ) => {
val _maps_ = __b__.`internal checkExtract`(""" case code"Option.empty[$t]" => ...
^""", _maps_0_)()("t")()
__b__.`internal CodeType`[Typ](_maps_._2("t"))
}))
}
}
new $anon()
}
Duplicating that does not seem like a good idea.
Proposals for Pattern Matching Improvements
ā¢ Pattern Variables in Alternatives
It should be possible to bind pattern variable in the branches of pattern alternatives, just like in OCaml.
For example, I want to be able to write:
case Left(a) | Right(a) => ...
The type given to a
in the right-hand side of the case
, assuming we are matching an Either[S,T]
, should obviously be the least upper bound of S
and T
. In Dotty, that corresponds to S | T
. For instance:
def get[S,T]: Either[S,T] => S | T = { case Left(a) | Right(a) => a }
This is a seemingly trivial change, but it would make so many uses of pattern matching in Scala more concise! For one thing, instead of the awkward:
try ... catch { case er @ (_: E | _: F) => ... }
we could write:
try ... catch { case er: E | er: F => ... }
but this also really translates to more expressive power. Consider the following OCaml pattern matching code, which I wrote in my last project:
let rec normalize = function
| TUnion(TBot,t) | TUnion(t,TBot)
| TInter(t,TTop) | TInter(TTop,t)
| TUnion((TTop as t),_) | TUnion(_,(TTop as t))
| TInter(_,(TBot as t)) | TInter((TBot as t),_)
(* the above is a single pattern! *)
-> normalize t (* and here's the right-hand side of the match *)
...
In order to write that in Scala, one would have to split the pattern into many cases, and duplicate the right-hand side. When the right-hand side is non-trivial, this can become a problem. Instead, I just want to be able to write:
case TUnion(TBot,t) | TUnion(t,TBot)
| TInter(t,TTop) | TInter(TTop,t)
| TUnion((t @ TTop),_) | TUnion(_,(t @ TTop))
| TInter(_,(t @ TBot)) | TInter((t @ TBot),_)
=> normalize(t)
As a generalization of this, once we have type-enforced null-safety, we could also say that if a pattern variable appears in only one of two pattern alternatives (something that is illegal in OCaml), in Dotty we would give it type T | Null
. Or even something like T | Unit
could already be useful, in the absence of null-safety.
ā¢ Generalized Pattern Binding
In Scala and Haskell, @
is used to bind a name to a subpattern. However, I donāt see a reason why it couldnāt be used to bind two or more arbitrary subpatterns:
case FirstPattern(a) @ SecondPattern(b, c) => // ... use a, b, c
For example, consider hypothetical character extractors StartingWith
and EndingWith
for String
, which could then be used as:
def isPalindrom: String => String = {
case "" => true
case str @ StartingWith(a) @ EndingWith(b) => a == b && isPalindrom(str.tail.init)
}
(Of course, the above is inefficient because it destructures a string inductively, incurring lots of copying. We could make it more efficient using some kind of StringSlice
or StringView
type, though.)
ā¢ Partial Destructuring in Guards
Preamble: boolean blindness of conditional guards.
Have you ever seen or written this pattern?
case Var(name)
if boundVariables.contains(name)
=> foo(boundVariables(value))
This is terrible: it queries the boundVariables
map twice: once to find out whether the name
key is in it, and another to actually access it. In addition, this suffers from āboolean blindnessā: the type system does not track whether you made the contains
check before doing the access, and a refactoring could easily break that invariant silently.
There is a better function for querying maps in Scala called get
, which returns an option in one go. But that method cannot be used here, without refactoring the pattern match expression in a disruptive way: we would have to move the conditional branching out of the conditional guard and into the right-hand side of the case, which means weād have to modify the other cases in the match too.
A typical example is when calling collect
as below:
exprs.collect {
case Var(name) if bv contains name => Left(bv(name))
case Const(v) => Rigth(v)
}
which canāt be easily adapted to use get
on the map, instead requiring a complete refactoring:
exprs.flatMap { // can't use collect anymore!
case Var(name) => bv.get(name).map(Left.apply)
case Const(v) => Some(Rigth(v))
}
Things would get even messier if we wanted to test something on the extracted value
in the guard:
case Var(name)
if boundVariables.get(name).exists(value => value > 0)
=> foo(boundVariables(value))
This is just an instance of the general problem that conditional guards in Scala suffer from boolean blindness.
Solution: destructuring pattern guards.
We should allow several conditional guards headed by if
, taking either a boolean expression or a destructuring statement with an =
sign. For example:
case Var(name)
if Some(value) = boundVariables.get(name)
if value > 0
=> foo(value)
There is no simple way that I know of doing something like the above in current Scala, aside from defining a custom extractor, which is very verboseā¦
This is very similar to the pattern guards that Haskell added in 2010. Itās also related to Rustās more restricted if let
idiom.
Similarity and unification with partial matching in comprehensions.
There has been a lot of talk about separating total from partial destructuring in for
comprehensions. One of the proposed syntax was:
for { x <- eithers; if Left(y) = x; if Right(z) = y } yield z
// equivalent to:
for { if Left(Right(z)) <- eithers } yield z
ā¦instead of the current Scala code below, which is possibly surprising because it hides the fact that the destructuring is partial and actually performs filtering:
for { Left(Right(z)) <- eithers } yield z // filters
// NOT equivalent to:
for { x <- eithers; Left(y) = x; Right(z) = y } yield z // fails with MatchError!
I like the proposed if
syntax, and suggest to generalize it to pattern guards.
ā¢ Interoperable partial functions, option-returning functions, and extractors
The Extractor.scala project should probably be part of the standard library, as it seems to be something pretty useful and fundamental. It allows converting among A => Option[B]
, PartialFunction[A, B]
, and extractor objects. This would be especially useful since the syntax of the latter is so verbose.
There is actually already a SIP for that one!
Note: the following proposals are more āfood for thoughtā than serious proposals, and I did not take time to write good motivations for themā¦
ā¢ Allow information to flow into patterns explicitly
In the previous section, I argued that not being able to spell out implicit arguments in patterns was a wart. Here Iām going to propose something potentially even more controversial: allow normal argument lists in patterns, where arguments flow into the pattern to influence it.
The idea is that, if weāre going to be able to pass more argument lists to a pattern for the implicits, why not allow non-implicit argument lists? We would consider that values flow into the pattern for all argument lists after the first one.
Thisād be useful when we want to configure the semantics of a pattern. Weirdly, there is already an instance of information flowing into patterns in current Scala, which is due to string interpolators! Consider:
case foo"... $a ... ${Left(b)} ..." =>
ā¦which is is equivalent to:
case scala.StringContext("... ", " ... ", " ...").foo(a,Left(b)) =>
ā¦but the expanded form above is not valid syntax! Why not allow it too, while weāre at it?
ā¢ Refinement of above: allow intra-pattern dependencies
This is probably a long shot and would likely be non-trivial to implement, but Iām putting it out there because it would be nice.
Here is an immediate use case, to extract a variable and compare it with another subpattern, as an alternative to extracting two variables and comparing them in a guard:
object equalTo {
def unapply[A](scrut: A)(compared: A): Boolean =
scrut == compared
}
// f matches pairs containing the same Int element:
val f: (Int,Int) => Option[Int] = {
case (n, equalTo()(n)) => Some(n)
case _ => None
}
ā¢ Pattern Blocks
Scala is an expression-oriented language: we allow blocks in expression position, which can be useful to introduce temporary bindings ā for example, instead of foo(bar(1) + bar(1))
we can write foo({val b = bar(1); b+b})
.
Why not allow blocks in patterns? It could be used, similarly, to introduce temporary bindings. A block in pattern position would return a pattern, instead of an expression.
case {
val N = ... // expensive computation
(0,N) // the result of this block is the (0,N) pattern
} => true
case (1,N) // N is still visible here!
=> true
It is worth noting that internally, the Scala compiler already represents patterns in a pretty similar way.