How about an easy explanation, one that a middle- or high-schooler who just learned to solve 3-equations and 3-variables and has seen probability can understand?
Let's say you have three websites: Reddit, Yahoo, and Google. Let's say Reddit links to Yahoo and Google, and Yahoo links to Google, and Google links to Reddit.
If you start on a random page and click randomly, what happens? You'll land on Reddit with probability x, Yahoo with probability y, and Google with probability z. Notice that if you continue clicking randomly, x, y, and z will stop changing after a while.
Okay, now stop clicking. How likely are you to have landed on page Google? (i.e., What's z?)
- You're on Reddit with probability x, and 1/2 of the links there will take you to Google.
- You're on Yahoo with probability y, and 1/1 of the links there will take you to Google.
- You're on Google with probability z, and 0/1 of the links there will take you to Google.
Therefore, z = x(1/2) + y(1/1) + z(0/1)
Similarly, x = x(0/2) + y(0/1) + z (1/1)
Similarly, y = x(1/2) + y(0/1) + z (0/1)
Since x + y + z = 1, we get: x = 2/5, y = 1/5, z = 2/5
What does that mean? It means Reddit and Google are twice as "important" as Yahoo, since you're twice as likely to land on them.
There, we just learned how PageRank works with zero linear algebra business.
So I feel that if the goal is to give a conceptual understanding, linear algebra is overkill. If the goal is to show how it's done in practice then this isn't terribly useful since it doesn't even show the tip of the iceberg!
A useful paper is also Bryan and Leise's The $25B Eigenvector, The Linear Algebra Behind Google.
G is a matrix of transition probabilities - each entry is the probability of moving directly from a particular page to another particular page. It is composed of 3 terms:
- the hyperlink matrix: the probabilities obtained by assuming a user randomly selects a link to follow from their current page, with an equal probability for each
- the dangling nodes matrix: to ensure that it is possible to leave every page, this adds an equal probability for moving from a page with no outgoing links to every other page
- the matrix U of all ones: this provides G with 2 desirable properties, by ensuring it is both irreducible/strongly-connected and aperiodic. It does this by making it possible to move from any page to any other page, with some small probability (exactly how small is determined by the damping factor d).
The pagerank vector represents the equilibrium probability distribution - in other words, the probability of being on a particular page after starting on a random page then spending a long time randomly moving between pages according to the probabilities in G.
Now, if at a given time your probability of being on each page is given by some vector X, the probability of being on each page after one random move is the transition matrix multiplied by X.
The equilibrium probability vector thus has the property that it is unchanged by multiplication by the transition matrix - it is therefore an eigenvector of the transition matrix, with an eigenvalue of 1.
Edit: After typing this, I see that explanatory hypertext appears when you mouse over the blue words.
When deleting the last one it crashes (tested in FF and Chrome)