undefined | Better HN

0 pointsthrowaway12345t9mo ago0 comments

We need more indexes

0 comments

9 comments · 4 top-level

tripplyons9mo ago· 3 in thread

More competition in the space would be great for me as a consumer, but the problem is that the high fixed costs make starting an index difficult.

andai9mo ago

I've been wondering can't this be done p2p? Didn't we solve most of the technical problems in the late 90s / early 2000s? And then just abandoned that entire way of thinking for some reason?

If many thousands of people care about having a free / private / distributed search engine, wouldn't it make sense for them to donate 1% of their CPU/storage/network to an indexer / db that they they then all benefit from?

hombre_fatal9mo ago

Well, flesh it out more and it doesn't sound solved at all.

How do you make it trustless. How do you fetch/crawl the index when it's scattered across arbitrary devices. How do you index the decentralized index. What is actually stored on nodes. When you want to do something useful with the crawled info, what does that look like.

andai9mo ago

I think you could do it hierarchically, and with redundancy.

You'd figure out a replication strategy based on observed reliability (Lindy effect + uptime %).

It would be less "5 million flaky randoms" and more "5,000 very reliable volunteers".

Though for the crawling layer you can and should absolutely utilize 5 million flaky randoms. That's actually the holy grail of crawling. One request per random consumer device.

I think the actual issue wouldn't be the technical issue but the selection. How do you decide what's worth keeping.

You could just do it on a volunteer basis. One volunteer really likes Lizard Facts and volunteers to host that. Or you could dynamically generate the "desired semantic subspace" based on the search traffic...

1 more reply

ineedasername9mo ago· 1 in thread

Do we know what OpenAI uses? Have they built their own, or piggy back on moneybags $MS and Bing?

tripplyons9mo ago

They use Bing: https://www.forbes.com/sites/katherinehamilton/2023/05/23/ch...

pzo9mo ago· 1 in thread

perplexity added API today, got the following email:

> Dear API user, We’re excited to launch the Perplexity Search API — giving developers direct access to the same real-time, high-quality web index that powers Perplexity’s answers.

tripplyons8mo ago

This doesn't mean they run their own index. They are likely just reselling access to whatever index they are using for their product.

JumpCrisscross9mo ago

> We need more indexes

Not particularly. Indexes are sort of like railroads. They're costly to build and maintain. They have significant external costs. (For railroads, in land use. For indexes, in crawler pressure on hosting costs.)

If you build an index, you should be entitled to a return on your investment. But you should also be required to share that investment with others (at a cost to them, of course).

j / k navigate · click thread line to collapse

0 comments

9 comments · 4 top-level

tripplyons9mo ago· 3 in thread

More competition in the space would be great for me as a consumer, but the problem is that the high fixed costs make starting an index difficult.

andai9mo ago

I've been wondering can't this be done p2p? Didn't we solve most of the technical problems in the late 90s / early 2000s? And then just abandoned that entire way of thinking for some reason?

hombre_fatal9mo ago

Well, flesh it out more and it doesn't sound solved at all.

andai9mo ago

I think you could do it hierarchically, and with redundancy.

You'd figure out a replication strategy based on observed reliability (Lindy effect + uptime %).

It would be less "5 million flaky randoms" and more "5,000 very reliable volunteers".

Though for the crawling layer you can and should absolutely utilize 5 million flaky randoms. That's actually the holy grail of crawling. One request per random consumer device.

I think the actual issue wouldn't be the technical issue but the selection. How do you decide what's worth keeping.

1 more reply

ineedasername9mo ago· 1 in thread

Do we know what OpenAI uses? Have they built their own, or piggy back on moneybags $MS and Bing?

tripplyons9mo ago

They use Bing: https://www.forbes.com/sites/katherinehamilton/2023/05/23/ch...

pzo9mo ago· 1 in thread

perplexity added API today, got the following email:

> Dear API user, We’re excited to launch the Perplexity Search API — giving developers direct access to the same real-time, high-quality web index that powers Perplexity’s answers.

tripplyons8mo ago

This doesn't mean they run their own index. They are likely just reselling access to whatever index they are using for their product.

JumpCrisscross9mo ago

> We need more indexes

If you build an index, you should be entitled to a return on your investment. But you should also be required to share that investment with others (at a cost to them, of course).

j / k navigate · click thread line to collapse