In my spare time I work on an experimental search engine named Ichido. Search is fascinating, there are so many features you can add to a search engine, but I find that the existing search engines are a bit limited in the features they have to offer. So I decided to work on my own search engine to test out different features, searching algorithms, and front ends in order to improve my (and hopefully others) searching experience.
Ichido includes a tagging system that provides more info on search results. For example, if a site links to Google services or uses Cloudflare, a tag is shown with the search result that let's the user know about that site's use of those services. Ichido also includes links to RSS feeds in search results, making it much easier to find RSS feeds.
This search engine is free to use, but if you like the service and want to support continued development please consider making a donation (Ichido currently supports donations through Libera Pay).
How much do you have to pay them for this?
Is it easy for you to rely on more search index providers, what are your options?
Correct me if I’m wrong though, but I’m pretty certain that all other search engines in the same category use one of these as their backend. Eg I’m pretty certain that counts for duckduckgo as well.
Money and resources and a dominant-enough position so that your crawlers are not blocked by websites.
Unfortunately.
Is there something obvious I'm missing that makes it infeasible, or maybe is it just something only I want?
As for this site there's too many tags for them to be useful imo. Give it 2 weeks of using the search engine and I bet you could hide silly fake tags in there and I'd never notice. Lots of tags = no tags.
I was picturing maybe a little pillbox type thing you might find appended to Google search results.
For instance when a result is a PDF: https://img.imgy.org/-7lq.jpg
https://blog.kagi.com/kagi-features
I only know about it because it pops up often on hn. Haven't tried it because at this point I don't want to pay $10 per month for search.
CloudFlare is a MitMaaS. Traffic is seen by them because they are in control of the HTTPS certificates, and you have to take them at their word that they do not log content (and even if they're not lying/under a gag order, just metadata is enough for a lot of evil things).
If yes, what's the endgame? Everyone goes back to managing their own servers?
If no, why is Cloudflare the only hosting provider that gets singled out?
These are all reasons I use Cloudflare lmao. Yes I need them to decrypt the traffic because they do various rules and caching for me. That DDoS protection would be pretty naff if they couldn't see the traffic! In one case I really wish they did log, I had to write my own Worker to log the info I needed.
If we were talking outbound proxy then fair enough but it's not like Cloudflare have strongarmed me into using them.. it was me that updated the NS records!
A lot of the list from that site just seem to describe what Cloudflare does, they don't seem to say why each thing is actually a bad thing.
Really does feel like someone's got a hate rod on for Cloudflare and tried to crowbar in as many VPN criticisms without understanding the difference between a VPN/proxy and a CDN.
Just added Ichido.
Click on "more engines" to activate it.
meta
https://www.gnod.com/search/
https://github.com/searx/searx
categories
independent
https://www.crawlson.com/
https://search.marginalia.nu/
https://wiby.me/
https://searchmysite.net/
international
https://bonzamate.com.au/ australia
https://www.baidu.com/ china
https://yandex.com/ russia
code
https://searchcode.com/
https://codesearch.ai/
http://symbolhound.com/
https://publicwww.com/
https://search.feep.dev/
http://codesearch.debian.net/
https://codesearch.isocpp.org/
https://www.programcreek.com/python/
https://livegrep.com/search/linux
https://grep.app/
ai
https://consensus.app/ scientific consensus
https://github.com/jokenox/Goopt procedurally generated
https://same.energy/ image similarity
products
https://www.looria.com/
https://knifist.com/ knives
https://attic.city/ home and fashion from indie stores
topical
https://biztoc.com/search business news
premium
https://kagi.com/
other
https://metager.org/ privacy centric engine that combines results of several engines
https://thangs.com/ 3d models
https://filmot.com/ youtube subtitles
lists
https://seirdy.one/posts/2021/03/10/search-engines-with-own-indexes/
https://web.archive.org/web/20200710091019/http://www.jaruzel.com/textfiles/Old%20Web%20Info/Internet%20Search%20Engines%20v2.61.txt
Hope its useful stillThanks!
Thanks, much appreciated
Apart from that, awesome project!
* Someone uploaded a WEBP image to the site.
* Someone pasted a link with a utm_* param.
* The page was crawled when cloudflare was used.
Will look into it and see if I can find the pages that generated the tags. Search results are generally tagged by domain name (necessary since not all pages can be crawled, and even if the page the user connects to doesn't have, for example google trackers, a user would likely want to know if the site is using trackers elsewhere).
Also love the spacehey project, really captures the feel of Myspace!
Glad you like SpaceHey :)
Keep up the great work!
And here's a lightweight frontend/proxy I wrote in C for using Google search on low-end phones that can't render bloated HTML (SearX was too complicated to install):
http://searc.4a.si:7327/search?q=news
It's also nice that the structured never constantly changing HTML it produces makes it ideal to programatically query Google. Although you still run into captchas which it cannot solve if queries get too suspicious.
I find the webp flag interesting, as I don't think webp itself is inherently harmful, except for being an image spec that solely exists because Google NIHs everything and wants to write their own everything. (Long live JPEG-XL!)
I'm curious why you chose to tag it explicitly though.
JXL is pretty much dead thanks to Google... and avif is still mostly suited to thumbnails.
A piece of feedback: When I select "Remove top ...." and click Submit, then click Next, the popularity filter is gone.
Edit: looks like the file type filter is dropped as well. Do add the arguments to the pagination links.
Thank you, great feedback! You're right, I forgot to include some of the params in the pagination, will have to include those in the next update.
also: happy to give this a try, more knobs for power users
Thanks for the heads up, I used to have a <link rel="search"> to the opensearch in a prior iteration of the site, must have removed it by mistake. Will add in the link in the next release.
Also one really useful tag would be "Affiliate links" if there is a way to identify a page contains affiliate links like amazon affiliate, etc. Those pages are always almost crap.
Also a tag for "Modal popups", those are too often just marketing related websites and definitely want to skip it if I know prior to visiting.
Then tried same search with popularity set to 500000 and could not even get a single full page of 10 results. It's laughable to assume from this "search" that only, say, 500004 out of the millions of websites in existence include this term. Not that I want to browse a full list, but at least I want to know how many hits I got. Then I can add more terms and try to reduce that number.
If a site has scripts then it's not "This site may be using Javascript", it's for sure that the site uses it...?
And popularity filter doesn't work, the results are empty and if you try going to any of the other pages it removes the filter