What I heard about a smaller search engine was that web crawling is usually augmented with some manually added rules for various sites to prevent spoiling the database. Not a trivial task at all.
Doing queries is IMHO algorithmically much better understood, because it's a constrained problem. But getting information extracted out from the real world, with all the PHP and HTML "hackers", not so easy.