The Y combinator is a central concept in lambda calculus, which is the formal foundation of functional languages. Y allows one to define recursive functions without using self-referential definitions.
Most articles Iāve seen dedicated to explaining the Y combinator start by showing you the Y combinator, which in itself is fairly inscrutable, and then trying to explain why it works.
I think this is backwards. In this article, weāll go the other way around: Iāll start by describing, in simple terms, the essence of the Y combinator ā or how to make recursion work without recursive definitions, and from there I will derive the usual Y combinator.
Since weāll be using Python, Iām actually going to describe a slight variation of the Y combinator, called the Z combinator. Thatās because Y proper only work in languages with non-strict evaluation (like Haskell, which uses lazy evaluation).
Background
I gave a lecture on recursion and term equivalence in the lambda calculus for the Foundations of Software course at EPFL. This article recounts what I came up with to explain the Z combinator in simple, intuitive terms. It is inspired by the excellent āThe Why of Yā writeup by Richard P. Gabriel, available here. I make essentially the same argument here, but slightly simplified and in Python (instead of Scheme).
Recursive definitions in Python
Here is a typical recursive definition, the factorial function:
def fact(x):
if x == 0: return 1
else: return fact(x - 1) * x
To see how it works, consider that fact(3)
computes fact(2) * 3
which is fact(1) * 2 * 3 == fact(0) * 1 * 2 * 3 == 1 * 1 * 2 * 3 == 6
.
For brevity, weāll write it in a style thatās a bit more functional:
def fact(x):
return 1 if x == 0 else fact(x - 1) * x
And for regularity with the rest of this article, let us rewrite that definition using Pythonās syntax for function literals (lambda expressions):
fact = (lambda x:
1 if x == 0 else fact(x - 1) * x)
which works similarly as the def
above:
print(fact(3)) # prints 6
Problem
Lambda calculus, which is the core of functional languages, only supports function literals and function application. So the definition above is not a valid lambda term ā indeed, it involves a recursive binding: the definition of fact
refers to itself!
Note that non-recursive (immutable) local variable bindings are fine, on the other hand: a program of the form x = value; ... x ...
, where value
does not refer to x
, can always be reformulated as (lambda x: ... x ...)(value)
, a proper lambda term.
Recursion without direct self reference
Our goal is to rewrite a given recursive function into a āplainā function literal ā that is, without using a recursive binding. But how is that possible? After all, for the function to be able to call itself recursively, it will need to refer to itself one way or the otherā¦
The trick is to make the self reference necessary for recursion indirect. Indeed, it out turns to be possible for a non-recursive function f
to refer to itself simply by passing itā¦ f
(itself) in argument!
For instance, consider the function f
below, which prints "in f"
before calling its argument function g
, passing it a dummy lambda:
def f(g):
print("in f")
g(lambda x: None)
The expression f(lambda h: print("in g"))
simply prints "in f"
as expected and then calls the lambda we passed as g
, which in turn prints "in g"
.
Now, let us consider what f(f)
does. Following the same logic as above, this code prints "in f"
twiceā¦ in effect, we have made f
call itself āunder the guiseā of a parameter g
.
Making recursion happen
Now, consider the following variation, which calls its parameter function g
with g
itself as the argument, and returns the result wrapped in a parameterless lambda (to prevent potential infinite loops):
def f(g):
print("in f")
return lambda: g(g)
Calling f(f)
prints "in f"
and then returns a parameterless lambda which, when appliedā¦ calls f(f)
again! Therefore, calling f(f)()
prints "in f"
twice, and calling f(f)()()()
prints "in f"
four times.
Essentially, the result of f(f)
corresponds to the following explicitly-recursive function:
def ff():
print("in f")
return lambda: ff()
So weāve seen how to make a non-recursive function behave as if it was recursive, by having it take a parameter that represents its self reference (to be filled in later) and applying that parameter to itselfā¦
That is the essence of the Y combinator!
There is not much more than this ātrickā to the essence of the Y (or Z) combinator. I argue that the rest is really just plumbing, to abstract the pattern above and make it nicer to use. This plumbing eventually yields the well-known (but inscrutable) Z
combinator, as we shall see next.
But first, let us exercise the trick weāve just devised to redefine our recursive fact
functionā¦ without using a recursive binding.
The factorial function in lambda calculus
Armed with this insight, we can rewrite fact
using a non-recursive function as follows.
First, we define the structure of the recursive definition:
fact_base = (lambda rec, x:
1 if x == 0 else rec(rec, x - 1) * x)
This non-recursive fact_base
definition takes two arguments: in addition to the input number x
, it takes a rec
function representing a future recursive reference to fact_base
itself.
Next, we ātie the recursive knotā by passing fact_base
to itself, and abstract that detail into a self-contained fact
function:
fact = lambda x: fact_base(fact_base, x)
For instance, consider what the expression fact(3)
expands to. The call is equivalent to fact_base(fact_base, 3)
, which expands to 1 if 3 == 0 else fact_base(fact_base, 3 - 1) * 3
, which reduces to fact_base(fact_base, 2) * 3
, etc. until it reduces to 1 * 1 * 2 * 3 == 6
.
A simplification: currying
Functions with several parameters can be encoded in the lambda calculus via currying, whereby a function taking two parameters is turned into a function that takes the first parameter and returns a function that takes the second parameter.
It is instructive to see how currying simplifies the definition of fact
:
# assuming the curried:
fact_base = (lambda rec: lambda x:
1 if x == 0 else rec(rec)(x - 1) * x)
# we now just write:
fact = fact_base(fact_base)
Mission accomplished! This is a valid lambda term, which indirectly implements recursion to compute the factorial of a number.
Are we done? Well, not quite. It is still pretty inconvenient to write recursive functions in this style, so weāll see next how to make it nicer.
Abstracting the outer self application
Writing recursive functions in the style we saw above gets old very quickly. Weād really like to abstract the boilerplate away into easy-to-use helper functions (combinators).
First, let us define a function that performs the outer self application (i.e., fact_base(fact_base)
) for us, and let us call that function mkrec
, as it will be used to make rec-ursive functions:
mkrec = lambda f: f(f)
fact = mkrec(lambda rec: lambda x:
1 if x == 0 else rec(rec)(x - 1) * x)
The definition of fact
becomes more straightforward (it can now be defined in one go), but recursive calls still look annoyingly noisy: they have to repeat the rec
argument, as in rec(rec)(x - 1)
.
Abstracting the inner self application
Weād prefer to simply write rec(x - 1)
instead of rec(rec)(x - 1)
in the definition of fact
itself. Let us make a function for that, and call it mkrec_nice
.
All mkrec_nice
has to do is to take a function that uses its rec
parameter in the desired style (as in rec(x - 1)
) and adapt it to use the āofficialā style under the hood (as in rec(rec)(x - 1)
):
mkrec_nice = (lambda g:
mkrec(lambda rec:
g(lambda y: rec(rec)(y))))
fact = mkrec_nice(lambda rec: lambda x:
1 if x == 0 else rec(x - 1) * x)
To understand why this works, consider what the call to mkrec_nice
expands into:
# starting with:
fact = mkrec_nice(g)
# where:
g = (lambda rec_nice: lambda x:
1 if x == 0 else rec_nice(x - 1) * x)
# we have:
fact = mkrec(lambda rec:
g(lambda y: rec(rec)(y)))
# inlining the call to g, wherein we substitute rec_nice with `lambda y: rec(rec)(y)`:
fact = mkrec(lambda rec:
(lambda x:
1 if x == 0 else (lambda y: rec(rec)(y))(x - 1) * x))
# reducing the corresponding applied lambda, this gives us the original:
fact = mkrec(lambda rec: lambda x:
1 if x == 0 else rec(rec)(x - 1) * x)
Now, weāre done! We have created a mkrec_nice
combinator that lets us define recursive functions easily, with minimal boilerplate.
The Z combinator
Plot twist: mkrec_nice
is actually āthe Z
combinatorā! But to view it in its traditional form, we first need to perform some obfuscationā¦
Weāll start by inlining the definition of mkrec
inside that of mkrec_nice
, which yields:
mkrec_nice = (lambda g:
(lambda f: f(f))((lambda rec: g(lambda y: rec(rec)(y)))))
Then, weāll reduce this inlined function ā which amounts to substituting f
in f(f)
with (lambda rec: g(lambda y: rec(rec)(y)))
, giving us:
mkrec_nice = (lambda g:
( lambda rec: g(lambda y: rec(rec)(y)) )
( lambda rec: g(lambda y: rec(rec)(y)) ))
This version of mkrec_nice
correspond directly to the traditional inscrutable presentation of Z
, which is written in lambda calculus as:
āZā=āĪ»g. (Ī»r. g (Ī»y. r r y)) (Ī»r. g (Ī»y. r r y))
The real Y combinator
The real Y combinator is actually a little simpler than the Z combinator that we say above. This is because in non-strict evaluation strategies such as lazy evaluation, not only functions can be recursive, but also plain values. So we can define a combinator that directly creates recursive values, without having to explicitly handle function calls and argument-passing.
More specifically, in Python syntax, weād use the following definition instead of mkrec_nice
:
mkrec_lazy = (lambda g:
mkrec(lambda rec:
g(rec(rec))))
When unrolled, this corresponds to:
mkrec_lazy = (lambda g:
( lambda rec: g(rec(rec)) )
( lambda rec: g(rec(rec)) ))
which is the traditional Y combinator, or in lambda calculus:
āYā=āĪ»g. (Ī»r. g (r r)) (Ī»r. g (r r))
However, such a definition cannot actually be used to define fact
in Python: it causes an infinite loop because Python evaluates function call arguments too eagerly.