First, we talk about characteristic functions. As noted, they are merely functions which return either 1 or 0, True or False. They represent sets by examining their argument and returning True if and only if the set they represent contains that thing.
So, if x is a characteristic function and (x x) is True, then x "contains" itself. Then N is just negation, so if N (x x) is True then x does not "contain" itself. That much is simple.
Now we form a special new characteristic function, R, defined as
R x = N (x x)
When is (R x) True? Exactly when x is "contained" in R. When does that happen? When N (x x), e.g., when x does not "contain" itself.So this is a well-defined characteristic function and therefore represents what we might think is a well-defined set.
But what happens when we check (R R), i.e. to see if R contains itself?
Well, R, by definition, contains all things which are not members of themselves. If R is in R then (R R) is True, but then N (R R) is False and (R R) = False. Contradiction!
We can inline R into (R R) in order to take a look at the functional form.
(\x -> N (x x)) (\x -> N (x x))
and the author notes that this is the same form as the Y-combinator with its first argument specialized to N. Y f = (\x -> f (x x)) (\x -> f (x x))
Y N = R R
And there we go! F = (\x -> Z (x x))
Y f = (\x -> f (x x)) (\x -> f (x x))
Y Z = F F
so, any set F that is defined in terms of how sets (functions) relate to themselves, fits into the Y combinator. Which is expected, since Y is the essence of recursion.Hopefully, you understood the core idea.
But even though the isomorphism between the Y combinator and Russell's paradox is elegant, the paradox is actually very deep (much deeper than the idea of a fixed point) -- that's why it took someone until the early 20th century to formalize it. For an awesome (and mind-blowing) explanation of the paradox, see Halmos' Naive Set Theory (botom of page 6): http://sistemas.fciencias.unam.mx/~lokylog/images/stories/Al...
So you'd have to reject a lambda like this: λx. x (x x), because when you rewrite that in set theory notation it looks like this: { x | x ∈ (x ∈ x) }. Bad since (x ∈ x) is a proposition used as a set. We'd also reject λx. x because it returns a set and not a proposition. Labeling expressions with one of two sorts (proposition or set) can be done statically, though, and this allows us to quickly check whether a lambda corresponds to a construction of naive set theory. Once you have this correspondence, the rest of the author's note goes through fairly well.
Except that technically Y = λN. R R, not R R.
In such a language, types classify terms and types correspond to theorems. The type/theorem for a fixed point is
fix : forall a . (a -> a) -> a
which as a theorem reads: "for any proposition A, if we can prove 'if A, then A' then we can prove A". Hopefully it's now clear why having 'fix' in your prover language won't work! > fun fix (f : 'a -> 'a) : 'a = fix f;
val fix = fn: ('a -> 'a) -> 'a
The trick is that all functions must be total, which is the crux of the matter (and why the Y-combinator exhibits the paradox, because it allows nontermination). This is why we have stuff like the polymorphic lambda calculus, in which termination is guaranteed and the Y-combinator is untypeable.[0] http://en.wikipedia.org/wiki/Curry%27s_paradox#Existence_pro...