The Y combinator is a central concept in lambda calculus, which is the formal foundation of functional languages. Y allows one to define recursive functions without using self-referential definitions.
Most articles I’ve seen dedicated to explaining the Y combinator start by showing you the Y combinator, which in itself is fairly inscrutable, and then trying to explain why it works.
I think this is backwards. In this article, we’ll go the other way around: I’ll start by describing, in simple terms, the essence of the Y combinator — or how to make recursion work without recursive definitions, and from there I will derive the usual Y combinator.
Since we’ll be using Python, I’m actually going to describe a slight variation of the Y combinator, called the Z combinator. That’s because Y proper only work in languages with non-strict evaluation (like Haskell, which uses lazy evaluation).
I gave a lecture on recursion and term equivalence in the lambda calculus for the Foundations of Software course at EPFL. This article recounts what I came up with to explain the Z combinator in simple, intuitive terms. It is inspired by the excellent “The Why of Y” writeup by Richard P. Gabriel, available here. I make essentially the same argument here, but slightly simplified and in Python (instead of Scheme).
Recursive definitions in Python
Here is a typical recursive definition, the factorial function:
def fact(x): if x == 0: return 1 else: return fact(x - 1) * x
To see how it works, consider that
fact(2) * 3 which is
fact(1) * 2 * 3 == fact(0) * 1 * 2 * 3 == 1 * 1 * 2 * 3 == 6.
For brevity, we’ll write it in a style that’s a bit more functional:
def fact(x): return 1 if x == 0 else fact(x - 1) * x
And for regularity with the rest of this article, let us rewrite that definition using Python’s syntax for function literals (lambda expressions):
fact = (lambda x: 1 if x == 0 else fact(x - 1) * x)
which works similarly as the
print(fact(3)) # prints 6
Lambda calculus, which is the core of functional languages, only supports function literals and function application. So the definition above is not a valid lambda term — indeed, it involves a recursive binding: the definition of
fact refers to itself!
Note that non-recursive (immutable) local variable bindings are fine, on the other hand: a program of the form
x = value; ... x ..., where
value does not refer to
x, can always be reformulated as
(lambda x: ... x ...)(value), a proper lambda term.
Recursion without direct self reference
Our goal is to rewrite a given recursive function into a “plain” function literal — that is, without using a recursive binding. But how is that possible? After all, for the function to be able to call itself recursively, it will need to refer to itself one way or the other…
The trick is to make the self reference necessary for recursion indirect. Indeed, it out turns to be possible for a non-recursive function
f to refer to itself simply by passing it…
f (itself) in argument!
For instance, consider the function
f below, which prints
"in f" before calling its argument function
g, passing it a dummy lambda:
def f(g): print("in f") g(lambda x: None)
f(lambda h: print("in g")) simply prints
"in f" as expected and then calls the lambda we passed as
g, which in turn prints
Now, let us consider what
f(f) does. Following the same logic as above, this code prints
"in f" twice… in effect, we have made
f call itself “under the guise” of a parameter
Making recursion happen
Now, consider the following variation, which calls its parameter function
g itself as the argument, and returns the result wrapped in a parameterless lambda (to prevent potential infinite loops):
def f(g): print("in f") return lambda: g(g)
"in f" and then returns a parameterless lambda which, when applied… calls
f(f) again! Therefore, calling
"in f" twice, and calling
"in f" four times.
Essentially, the result of
f(f) corresponds to the following explicitly-recursive function:
def ff(): print("in f") return lambda: ff()
So we’ve seen how to make a non-recursive function behave as if it was recursive, by having it take a parameter that represents its self reference (to be filled in later) and applying that parameter to itself…
That is the essence of the Y combinator!
There is not much more than this “trick” to the essence of the Y (or Z) combinator. I argue that the rest is really just plumbing, to abstract the pattern above and make it nicer to use. This plumbing eventually yields the well-known (but inscrutable)
Z combinator, as we shall see next.
But first, let us exercise the trick we’ve just devised to redefine our recursive
fact function… without using a recursive binding.
The factorial function in lambda calculus
Armed with this insight, we can rewrite
fact using a non-recursive function as follows.
First, we define the structure of the recursive definition:
fact_base = (lambda rec, x: 1 if x == 0 else rec(rec, x - 1) * x)
fact_base definition takes two arguments: in addition to the input number
x, it takes a
rec function representing a future recursive reference to
Next, we “tie the recursive knot” by passing
fact_base to itself, and abstract that detail into a self-contained
fact = lambda x: fact_base(fact_base, x)
For instance, consider what the expression
fact(3) expands to. The call is equivalent to
fact_base(fact_base, 3), which expands to
1 if 3 == 0 else fact_base(fact_base, 3 - 1) * 3, which reduces to
fact_base(fact_base, 2) * 3, etc. until it reduces to
1 * 1 * 2 * 3 == 6.
A simplification: currying
Functions with several parameters can be encoded in the lambda calculus via currying, whereby a function taking two parameters is turned into a function that takes the first parameter and returns a function that takes the second parameter.
It is instructive to see how currying simplifies the definition of
# assuming the curried: fact_base = (lambda rec: lambda x: 1 if x == 0 else rec(rec)(x - 1) * x) # we now just write: fact = fact_base(fact_base)
Mission accomplished! This is a valid lambda term, which indirectly implements recursion to compute the factorial of a number.
Are we done? Well, not quite. It is still pretty inconvenient to write recursive functions in this style, so we’ll see next how to make it nicer.
Abstracting the outer self application
Writing recursive functions in the style we saw above gets old very quickly. We’d really like to abstract the boilerplate away into easy-to-use helper functions (combinators).
First, let us define a function that performs the outer self application (i.e.,
fact_base(fact_base)) for us, and let us call that function
mkrec, as it will be used to make rec-ursive functions:
mkrec = lambda f: f(f) fact = mkrec(lambda rec: lambda x: 1 if x == 0 else rec(rec)(x - 1) * x)
The definition of
fact becomes more straightforward (it can now be defined in one go), but recursive calls still look annoyingly noisy: they have to repeat the
rec argument, as in
rec(rec)(x - 1).
Abstracting the inner self application
We’d prefer to simply write
rec(x - 1) instead of
rec(rec)(x - 1) in the definition of
fact itself. Let us make a function for that, and call it
mkrec_nice has to do is to take a function that uses its
rec parameter in the desired style (as in
rec(x - 1)) and adapt it to use the “official” style under the hood (as in
rec(rec)(x - 1)):
mkrec_nice = (lambda g: mkrec(lambda rec: g(lambda y: rec(rec)(y)))) fact = mkrec_nice(lambda rec: lambda x: 1 if x == 0 else rec(x - 1) * x)
To understand why this works, consider what the call to
mkrec_nice expands into:
# starting with: fact = mkrec_nice(g) # where: g = (lambda rec_nice: lambda x: 1 if x == 0 else rec_nice(x - 1) * x) # we have: fact = mkrec(lambda rec: g(lambda y: rec(rec)(y))) # inlining the call to g, wherein we substitute rec_nice with `lambda y: rec(rec)(y)`: fact = mkrec(lambda rec: (lambda x: 1 if x == 0 else (lambda y: rec(rec)(y))(x - 1) * x)) # reducing the corresponding applied lambda, this gives us the original: fact = mkrec(lambda rec: lambda x: 1 if x == 0 else rec(rec)(x - 1) * x)
Now, we’re done! We have created a
mkrec_nice combinator that lets us define recursive functions easily, with minimal boilerplate.
The Z combinator
mkrec_nice is actually “the
Z combinator”! But to view it in its traditional form, we first need to perform some obfuscation…
We’ll start by inlining the definition of
mkrec inside that of
mkrec_nice, which yields:
mkrec_nice = (lambda g: (lambda f: f(f))((lambda rec: g(lambda y: rec(rec)(y)))))
Then, we’ll reduce this inlined function — which amounts to substituting
(lambda rec: g(lambda y: rec(rec)(y))), giving us:
mkrec_nice = (lambda g: ( lambda rec: g(lambda y: rec(rec)(y)) ) ( lambda rec: g(lambda y: rec(rec)(y)) ))
This version of
mkrec_nice correspond directly to the traditional inscrutable presentation of
Z, which is written in lambda calculus as:
Z = λg. (λr. g (λy. r r y)) (λr. g (λy. r r y))
The real Y combinator
The real Y combinator is actually a little simpler than the Z combinator that we say above. This is because in non-strict evaluation strategies such as lazy evaluation, not only functions can be recursive, but also plain values. So we can define a combinator that directly creates recursive values, without having to explicitly handle function calls and argument-passing.
More specifically, in Python syntax, we’d use the following definition instead of
mkrec_lazy = (lambda g: mkrec(lambda rec: g(rec(rec))))
When unrolled, this corresponds to:
mkrec_lazy = (lambda g: ( lambda rec: g(rec(rec)) ) ( lambda rec: g(rec(rec)) ))
which is the traditional Y combinator, or in lambda calculus:
Y = λg. (λr. g (r r)) (λr. g (r r))
However, such a definition cannot actually be used to define
fact in Python: it causes an infinite loop because Python evaluates function call arguments too eagerly.