I think that the only other funding model (other than ads) that I’ve seen is a subscription, and I’ve only seen Kagi do that. It seems to work well for them though, last time I checked they had something like 50’000 subscribers.
I kind of like your data idea, but I’m not sure how much of a market there would be for anonymous data like that. How anonymous would it be, and what kind of data would be collected? You would probably want to be careful not to let it turn out like the AOL Query Log.
I have thought that it might be fun to try monetizing a website by serving the user a chunk of javascript that does a small amount of bitcoin mining.
The nice thing about charging for access is that you can start with a low bandwidth server, and then increase the price a little if there are too many users for the server to handle. I don’t know of a way to “increase the data collection” if you need to lower the user count. Of course the bad thing about charging money is that it is a barrier for people who might want to try it.
As a side note, I would recommend using old data from the Common Crawl as a large part of the index. No-one has more than a few billion pages from them indexed, and the data that I’ve experimented with from 2014 seemed high quality, with very little spam. A lot of the old links won’t exist anymore, and it would be nice to index that stuff. If you got all or most of their pages (they have around 80 billion unique ones I believe) it would be like a full text search engine for the WayBack Machine (which maybe almost existed at one point, see this article: https://archive.org/details/search-timeline).
I’m always excited to see people start up search engine projects, is there a way for me to follow your progress?