Understanding the Y Combinator

I am now reading Types and Programming Languages, and I’ve been trying to wrap my head around the Y combinator and why it works. This is how I explained it to myself:

We want to define a recursive function, which means the function should be able to refer to itself. Unfortunately, Lambda calculus does not give us this option, so we define a function of the form: $g = \lambda func. \lambda param.\space\text{return something or} func(param')$

Where $param'$ is some reduced version of $param$. In order for this thing to be a recursion, we need to find a parameter $func$ for which: $func = g(func)$

We notice that $func$ is a fixed point of $g$ by definition, hence the name fixed point combinators. We also notice that it must be a function of $g$ (otherwise it’s a constant and we can show trivially that it doesn’t work, duh), so we can write: $Y(g) = func = g(func) = g(Y(g))$

Cool. Now we are looking for $term_0$ such that: $Y = \lambda f.\underbrace{\text{ } f(term_0) \text{ }}_{term_0}$

Hmmm… Ok. Let’s try to name $term_0$ and pass it as a parameter to $f$, will that work? $Y = \lambda f.\underbrace{(\lambda x. f(x))}_{term_0} term_0 =\lambda f.\underbrace{(\lambda x. f(x))}_{term_0}\underbrace{(\lambda x. f(x))}_{term_0}$

This sucks. After our change we no longer need to pass $term_0$ to $f$, but $term_1$ : $Y = \lambda f.\underbrace{(\lambda x. f(term_1))(\lambda x. f(term_1))}_{term_1}$

Fortunately, can easily express $term_1$ using our bound variable $x$ as $(x x)$! So if we write: $Y = \lambda f.(\lambda x. f(x x))(\lambda x. f(x x))$

We win!

We can see that the Y combinator as we defined it is a fixed point of the function $g$, just like we wanted.
Great success :)

I ignored reduction rules for the sake of simplicity, but it’s interesting to note that the book talks about another type of fixed point combinator, which should be used under different reduction rules: $fix = \lambda f. (\lambda x. f (\lambda y. x x y))(\lambda x. f (\lambda y. x x y))$

I wonder how many fixed points $g$ has under each set of reduction rules.