I have thought about trying local http proxies over bluetooth serial, the new web bluetooth standards, intercepting xhr in native Android code to send requests over serial, wifi direct, wifi regular mode but w/o route to internet, serving the app over lte so my android connects to it through the internet.
All of these are brittle hacks that have unintended consequences (e.g. can connect to wifi regular mode w/o default gateway but then it only connects to lte, web bluetooth isn't a request/response model, have to build framing to proxy multiple requests over bluetooth serial).
Has anyone done this and it worked??
I would like to publish it for others to use, but I'm not sure how useful it would be. In the HN spirit of validating customers early, I'd like to gauge interest of those would actually download and use such a resource before moving forward. Let me know.
Stackoverflow? Particular forums?
Several services would likely do this for several thousand dollars but I'm hoping for something cheaper.
https://files.pushshift.io/hackernews/
For starters, we need a communal dataset of 50-100 million urls/data/metadata that's "good" (for any value of good) to help people experiment with web search tech.
We need more Kaggle competitions to explore better summarization, reader mode text extraction, boilerplate removal, etc. What other competitions would help web search?
How can we foster a sustainable community and projects to create a bazaar of web search engines?
What relevant research or projects are trying to make these sorts of algorithms and data accessible as a future commodity people can build on top of?
Common crawl is bigger but I've also read still is not good enough to use as input for good web search.
Did the trec web track produce anything useful?
[1] http://www-personal.umich.edu/~kevynct/trec-web-2014/