Undergraduate shows that searches within hash tables can be much faster (opens in new tab)

(quantamagazine.org)

1304 pointsJhsto1y ago584 comments

584 comments

203 comments · 53 top-level

brink1y ago· 93 in thread

Krapivin made this breakthrough by being unaware of Yao's conjecture.

The developer of Balatro made an award winning deck builder game by not being aware of existing deck builders.

I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before. This makes me kind of sad, because the current world is so interconnected, that we rarely see such novelty with their tendency to "fall in the rut of thought" of those that came before. The internet is great, but it also homogenizes the world of thought, and that kind of sucks.

aidenn01y ago

> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before. This makes me kind of sad, because the current world is so interconnected, that we rarely see such novelty with their tendency to "fall in the rut of thought" of those that came before. The internet is great, but it also homogenizes the world of thought, and that kind of sucks.

I think this is true only if there is a novel solution that is in a drastically different direction than similar efforts that came before. Most of the time when you ignore previous successful efforts, you end up resowing non-fertile ground.

9 more replies

robotelvis1y ago

In my experience the best approach is to first try to solve the problem without having read the prior work, then read the prior work, then improve your approach based on the prior work.

If you read the prior work too early to you get locked into existing mindsets. If you never read it then you miss important things you didn’t thought of.

Even if your approach is less good than the prior work (the normal case) you gain important insights into why the state of the art approach is better by comparing it with what you came up with.

3 more replies

helloplanets1y ago

> The developer of Balatro made an award winning deck builder game by not being aware of existing deck builders.

He was aware of deck builders and was directly inspired by Luck be a Landlord, but he was not aware of just how massive the genre is.

Direct quote from the developer:

> The one largest influence on Balatro was Luck Be a Landlord. I watched Northernlion play for a few videos and loved the concept of a non-fanatsy themed score attach roguelike a ton, so I modified the card game I was working on at the time into a roguelike.

> I cut myself off from the genre at that point intentionally, I wanted to make my own mistakes and explore the design space naively just because that process is so fun. I hear the comparison to Slay the Spire a lot but the truth is that I hadn't played that game or seen footage of it when I designed Balatro, not until much later.

https://www.reddit.com/r/Games/comments/1bdtmlg/comment/kup7...

1 more reply

chambers1y ago

  “They’re cheering for you,” she said with a smile. 
  “But I could never have done it,” [Milo] objected, “without everyone else’s help.”
  “That may be true,” said Reason gravely, “but you had the courage to try; 
     and what you can do is often simply a matter of what you will do.”
  “That’s why,” said King Azaz, “there was one very important thing about your quest 
     that we couldn’t discuss until you returned.”
  “I remember,” said Milo eagerly. “Tell me now.”
  “It was impossible,” said the king, looking at the Mathemagician.
  “Completely impossible,” said the Mathemagician, looking at the king.
  “Do you mean … ,” said the bug, who suddenly felt a bit faint.
  “Yes, indeed,” they repeated together, “but if we’d told you then, you might not have gone … 
    and, as you’ve discovered, so many things are possible just as long as you don’t know they’re impossible.”

- The Phantom Tollbooth (1961)

rincebrain1y ago

A professor I had in college, whose first published result was from a piece of homework he turned in where he incidentally solved an open question about bound on a problem, had a curious habit.

I ended up failing and taking his course again (because I had A Lot going on in college), and thus, noticed something.

Each semester, on one of the assignments in the latter half of the class, he assigned one problem out of, perhaps, 30 in the problem set, where as written, it was actually an open problem, and then a day or two before they were due, he'd send out an "oops, my bad" revised version.

I suspect that this was not an accident, given that it always happened only once.

1 more reply

somenameforme1y ago

I think going one layer lower - the fundamental issue is that the internet drives people to unrealistic perceptions of the competence of others. Think about all of the undeniably brilliant people that have been involved in software over the past 40 years, and how many of them used hash tables in performance critical environments. Let alone mathematicians and others using them in applied domains. And you think there's something fundamental that all of these people just somehow missed?

The argument of 'if that's such a good idea, why wouldn't somebody have just done it already?' seems to have grown exponentially with the advent of the internet. And I think it's because the visibility of competence of other's became so much more clear. For those who lived through e.g. Carmack's Golden Age you knew you were never going to be half the coder he was, at least based on the image he successfully crafted. That 'slight' at the end is not to say he wasn't a brilliant developer or even perhaps the best in the world at his peak, but rather that brilliance + image crafting creates this Gargantuan beast of infallibility and exceptionalism that just doesn't really exist in reality. I think it's from this exact phenomena that you also get the practical fetishism of expertise.

SideQuark1y ago

Picking two examples out of all people approaching problems, while ignoring wasted effort and failures to make progress because of not understanding current knowledge, is an absolutely terrible reason to approach from ignorance.

The biggest gains in theory and in practice are far more often obtained by masters of craft, giving much more weight to attacking problems from a position of knowledge.

In fact, even in this case, this progress required that the author was aware of very recent results in computer science, was thinking deeply about them, and most likely was scouring the literature for pieces to help. The “Tiny Pointers” paper is mentioned directly.

1 more reply

dataviz10001y ago

A similar idea came up in Veritasium's latest video today. Training AI by DeepMind to predict protein folding greatly improved by withholding the most evident information about a protein's primary structure — its linear polypeptide chain — within the The Structure Module step. [0]

After asking ChatGPT not to agree with me that your comment and these two different approaches to solving problems are the alike, it concluded there still might be similarities between the two.

[0] https://youtu.be/P_fHJIYENdI?feature=shared&t=1030

[1] https://chatgpt.com/share/67aa8340-e540-8004-8438-3200e0d4e8...

layer81y ago

It’s important to think outside the box, and that’s easier when you’re not aware of the box, but we also stand on the shoulders of giants, and are doomed to repeat history if we don’t learn from it. As usual, things aren’t clear-cut.

3 more replies

cdelsolar1y ago

I’ve been working on and off for years on a scrabble endgame solver; it uses all these techniques from chess like transposition tables, Negamax with alpha beta pruning, NegaScout, aspiration search and so on. There’s a French person who built his own endgame solver and this solver is significantly faster than mine, even with all of the optimizations that I’ve put into it. He is kind of secretive about it because it’s closed source and he makes some money on it, but we’ve talked a bit about it, compared some positions and we’ve determined that his move generation algorithm is actually not asoptimized as mine. But he can still solve the endgame faster despite seeing fewer positions, which implies to me that he’s doing a significantly better job of pruning the tree.

But when we try to talk details, I asked him for example do you use minimax with alphabeta pruning and he told me like “I’m not sure if I am using minimax or what that is :(“ .. I ask him to describe what he does, he essentially describes minimax with pruning. I’ve sorta figured out that he must be doing some very intelligent version of an aspiration search. It’s really eye-opening because he doesn’t have any of this training. He’s never seen any related algorithms, he’s just figuring all this out on his own.

smj-edison1y ago

I think of Andre Geim as a great example of balancing the two. I couldn't find the exact quote, but he said something to the effect of "when I enter a new field, I make sure I learn the basics so I don't spend all my time making dumb mistakes. But I don't get so into it that I get stuck in the mindshare."

I'll also say I think that diversity in approaches is more important than One Right Way. Some people need to set out on their own, while others spend decades refining one technique. Both have led to extraordinary results!

abetusk1y ago

I disagree.

Many problems are abstract and so we have to build "cartoon" models of what's going on, trying to distill the essence of the problem down to a simple narrative for what the shape of the problem space is and where the limitations are. That often works but backfires when the cartoon is wrong or some assumptions are violated about when the cartoon description works.

Results like this are pretty rare, nowadays, and I suspect this happened because the problem was niche enough or some new idea has had time to ferment that could be applied to this region. This seems like a pretty foundational result, so maybe I'm wrong about that for this case.

A lot of progress is made when there's deeper knowledge about the problem space along with some maturity for when these cartoon descriptions are invalid.

eterevsky1y ago

If we achieved local maximum at something, the only way to progress is to make a big leap that brings you out of it. The trouble is that most of such big leaps are unsuccessful. For every case like you are describing there are probably hundreds or thousands of people who tried to do it and ended up with something worse than the status quo.

genghisjahn1y ago

This reminds me of the Neal Stephenson article "Innovation Starvation" from 2011:

>A number of engineers are sitting together in a room, bouncing ideas off each other. Out of the discussion emerges a new concept that seems promising. Then some laptop-wielding person in the corner, having performed a quick Google search, announces that this “new” idea is, in fact, an old one—or at least vaguely similar—and has already been tried. Either it failed, or it succeeded. If it failed, then no manager who wants to keep his or her job will approve spending money trying to revive it. If it succeeded, then it’s patented and entry to the market is presumed to be unattainable, since the first people who thought of it will have “first-mover advantage” and will have created “barriers to entry.” The number of seemingly promising ideas that have been crushed in this way must number in the millions. What if that person in the corner hadn’t been able to do a Google search?

>In a world where decision-makers are so close to being omniscient, it’s easy to see risk as a quaint artefact of a primitive and dangerous past (…) Today’s belief in ineluctable certainty is the true innovation-killer of our age

pjc501y ago

I believe Ramanujan did the same with various maths problems. The Cambridge undergrad course sprinkles a few unsolved problems in the practice questions just in case someone does this again.

1 more reply

voidhorse1y ago

There's a reason the phrase "beginner's luck" exists. I'm not sure the naïveté and success are causally related so much as they might be coincident.

Could knowing about prior research skew one's perspective and tarnish novel thought? Sure. But we don't know. Maybe we'd have an even better Balatro if the creator knew about some other deck builders. Maybe we wouldn't, we don't know. We cannot prove the counterfactual.

On the opposite extreme, there are examples of thinkers whose success stemmed from knowing much about one domain or much about many domains and integrating (Luhmann, Goethe, Feynman, Von Neumann etc.). In the general case, I think we are probably much better off promoting knowledge and study, and not ignorance and chance.

That said, I do think we should retain our willingness to play and to try things that are "out of bounds" with respect to the existing accumulated knowledge. We should live informed lives, but play and explore like unschooled children.

necovek1y ago

> the authors have also learned of several other hash tables that make use of the same high-level idea in different settings [7, 9].

At least part of the result was already known, and the fact authors didn't know about it mostly goes to the large corpus of knowledge we already posses.

But the core inspiration came from looking at another recent research paper "Tiny Pointers": that is totally against your premise.

If Krapivin was a software engineer looking to implement this solution as optimization for a particular problem, he would have done so without ever thinking of making a research paper to prove it formally, but mostly relied on benchmarking to prove his implementation works better.

Now, it has always been somewhat true that lots of existing knowledge limits our creativity in familiar domains, but you need both to really advance science.

latexr1y ago

That is called Shoshin, or Beginner’s Mind.

https://en.wikipedia.org/wiki/Shoshin

ajross1y ago

> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.

This is an "Einstein failed Math" fallacy. It's true that novel and notable work tends strongly not to be bound by existing consensus, which when you say it that way is hardly surprising. So yes, if consensus is wrong in some particular way the people most likely to see that are the ones least invested in the consensus.

But most problems aren't like that! Almost always the "best" way to solve a problem, certainly the best way you're going to solve the problem, is "however someone else already solved it". But sometimes it's not, and that's when interesting stuff happens.

jay_kyburz1y ago

According to RPS the quote is that he had "barley played any roguelikelike deckbuilders" not that he was not aware of them.

There are a lot of great deck builders that are not roguelike. Has he played Dominion, Magic the Gathering, Hearthstone?

awesome_dude1y ago

There's a problem in all human understanding - knowing when, and knowing when not to apply pre-existing knowledge to a problem.

Have we been grinding away in the right direction and are only moments away from cracking the problem, or should we drop everything and try something completely new because we've obviously not found the solution in the direction we were heading.

To put it into a CS type context - Should we be using a DFS or BFS search for the solution, because we don't have knowledge of future cost (so UCS/Djikstra's is out) nor do we know where the solution lies in general (so A* is out, even if you ignore the UCS component)

taurknaut1y ago

> Krapivin made this breakthrough by being unaware of Yao's conjecture.

I don't think there's any evidence of this. Yao's conjecture is not exactly standard undergraduate material (although it might be—this is a commentary on detail rather than difficulty. But i certainly didn't encounter this conjecture in school). If not knowing this conjecture was the key, millions and millions of students failed to see what Krapivin did. I imagine you'd have to ask him what the key to his insight is.

Hashing is a pretty unintuitive sort of computation. I'm not surprised that there are still surprises.

1 more reply

anvuong1y ago

This is confirmation/survivorship bias. You only hear about these positive cases. The vast majority just ends up rediscovering old techniques and their year-long paper/work got rejected.

delichon1y ago

Unaccompanied Sonata is a 1979 short story by Orson Scott Card that takes this to an extreme, and has haunted me since I read it in the eighties.

hans-dampf1y ago

Your exact thoughts have already been put to paper by L.P.Hammet, godfather of physical organic chemistry (exact description of chemical reactions):

one might “... overlook the great difference between exact theory and approximate theory. Again, let me emphasize my great respect for approximate theory. [...] if one starts looking for an effect predicted by this kind of theory to be impossible, the odds are against a favorable outcome. Fortunately, however, the community of scientists, like that of horseplayers, contains some people who prefer to bet against the odds as well as a great many who always bet on the favorite. In science we should, I think, do all we can to encourage the man who is willing to gamble against the odds of this sort.

This does not mean that we should encourage the fool or the ignoramus who wants to play against suicidal odds, the man who wants to spend his time and usually someone else’s money looking for an effect incompatible with, let us say one of the conclusions reached by Willard Gibbs. Gibbs started from thoroughly proven generalizations, the first and second laws of thermodynamics, and reasoned from them by exact mathematical procedures, and his conclusions are the best example I know of exact theory, theory against which it is futile to struggle.”

chasing1y ago

> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.

Both Danny Trejo and Tim Allen spent time in prison before becoming famous. While that's interesting, I'm not sure I'm ready to believe that's the best way to become a professional actor.

Edit to be a little less snarky, apologies:

"Outsiders" are great for approaching problems from fresh angles, but I can almost guarantee that the majority of nuts-and-bolts progress in a field comes from people who "fall in the rut of thought" in the sense that they area aware enough of the field to know which paths might be most fruitful. If I had to place a bet on myself, I wouldn't take a wild uninformed swing: I'd get myself up to speed on things first.

Outsiders sometimes do great work. They also sometimes:

https://www.reddit.com/r/mathmemes/comments/wq9hcl/terrence_...

RALaBarge1y ago

There is a book about this theory written in the 1960's called 'The Structure of Scientific Revolution' by Kuhn that talks about some sciences which progress one funeral at a time and how progress is not linear. He also remarks how people from outside the standard thoughts and education surrounding the current system are typically the ones to actually progress science.

One example is Geocentrism vs Copernican astronomical models -- Copernican could never have sprung from the status quo because everything revolved around the Earth in Geocentrism instead of around the Sun. You can't square that circle.

https://en.wikipedia.org/wiki/The_Structure_of_Scientific_Re...

4 more replies

giantg21y ago

I've always had a mind that worked that way - I can imagine how something works or could work before looking up how it actually does work. But there's no real benefit to thinking that way in my experience. Thinking differently has only been a career impediment or gotten on my teachers nerves for being "smart" in my experience.

For example, as a young kid I saw a geometric ball made up of hinges that allow it to expand and contract, and in some stages it looks a little like a gear. So then I started wondering if you change gears instead of switching gears in a car. Then a decade or so later I started seeing CVT transmissions in cars, which is the same concept where you can change the size/ratio by expanding or contracting the roller instead of switching gears.

chikere2321y ago

Last year's Advent of Code had a task that was NP complete and lacked good well known approximation algorithms. I almost gave up on it when I realised as that feels impossible

In practice the data was well behaved enough and small enough that it was very doable.

klik991y ago

I agree in the specific case that the state of the art is in a local maxima, but saying "the best way to approach a problem is by not being aware of disregarding previous attempts" ignores the much more frequent banal work of iterative improvement. Leaping out of a local maxima is rare and sexy and gets articles written about you and is important, but the work of slowly iterating up to a nearby peak is also important.

I think progress needs both individual achievements who break out of preconceived notions and the communal work of improving within the notions we currently have.

youniverse1y ago

I watched a casual youtube video by a philosophy professor talking about the same thing that great scholars are different than great thinkers. Many great thinkers came up with great philosophies because they misread past works.

If anyone wants to watch: https://youtu.be/4vou_dXuB8M?si=Wdr7q96MFULPAEc4

Definitely something we should all keep in mind that sometimes you just have to pave your own way and hope it is great on its own merits.

ibejoeb1y ago

This is a really tough problem. I don't think ignorance is the answer, but it's also difficult to set aside things that seam legitimate and go down a rabbit hole of reinventing something on a hunch. I guess the saving grace is that it's impossible to know enough about such a wide swathe that it's often a problem. With large models that conceivably can encode the collective knowledge, though, we have to be vigilant about creating an orthodoxy that ultimately constrains us.

0x38B1y ago

"fall[ing] in the rut of thought" reminds me of this paragraph from "The Footpath Way":

> So long as man does not bother about what he is or whence he came or whither he is going, the whole thing seems as simple as the verb "to be"; and you may say that the moment he does begin thinking about what he is (which is more than thinking that he is) and whence he came and whither he is going, he gets on to a lot of roads that lead nowhere, and that spread like the fingers of a hand or the sticks of a fan; so that if he pursues two or more of them he soon gets beyond his straddle, and if he pursues only one he gets farther and farther from the rest of all knowledge as he proceeds. You may say that and it will be true. But there is one kind of knowledge a man does get when he thinks about what he is, whence he came and whither he is going, which is this: that it is the only important question he can ask himself. (The Footpath Way, Introduction (1))

Even though the author is talking about a different kind of knowledge, the image of sticks of a fan - where going down one gradually excludes the others - stuck with me.

1: https://www.gutenberg.org/ebooks/59813

namibj1y ago

Eventually I'll get to actually rolling a POC/tech demonstrator that just has less modules at perhaps less current density, for showing that even several kV DC can be efficiently transformed not just on paper to few or sub kV DC. At enough voltage grounding is no longer optional anyways, so might as well do essentially an auto transformer plus extra protection to protect humans against electric shock (RCD doesn't work directly, but the functionality can still be offered, it just has to sense quite differently).

Why DC? An overhead line only limited by peak voltage (arc) and thermals can carry twice the power when running DC instead of AC, assuming both measured relative to ground.

Also, you can run you transistors completely steady-state at all frequency components between their own switching fundamental and your load transients. No more over provisioning just to make up for legacy 50/60 Hz AC.

Also, to a degree, you can just plug raw batteries in with that be DC grid, at most having a little bit of DC regulation to force the voltage a bit higher/lower than the batteries. Like, a power supply basically rated to a couple percent of the battery input/output max power: only need to move the small extra voltage, though ofc at the full current.

Lastly, DC converters are just way smaller and lighter, so you could avoid the heavy bulky transformers in trains and alleviate power limiting from them. Relevant for fast double-decker trains because you'd prefer to have human space where you currently park the transformer.

I have to say though, novel development of technology by pulling recent innovations in the fundamental/material science fields underlying the target, is very not an easy thing to do.

rincebrain1y ago

Kind of?

You get novel branches of thought, but in the limit case, you're also reinventing the universe to bake an apple pie.

So there's something of a tradeoff between attempting to ensure people can do more than mimic existing doctrine and efficiency of getting up to speed without having to re-prove existing math and science.

The Balatro dev also, for example, has talked about how he was heavily influenced by several specific other games.

dathinab1y ago

I would say not letting your thoughts be constrained by the bias of existing approaches.

This isn't easy, at all. It requires training yourself into having a open and flexible mind in general.

Not knowing about something is more like a cheat to get there easier.

But it's supper common that innovation involves a lot of well known foundation work and just is very different in one specific aspects, and it's quite hard to know about the other foundation work but not that specific aspect especially if you don't even know which aspect can be fundamentally be "revolutionized"/"innovated".

But what always help if you learn about a new topic is to try blindly first yourself and then look at what the existing approaches do. Not just for doing ground braking work but even for e.g. just learning math.

One of the math teachers I had over the school years before university used this approach for teaching math it yielded way better independent understanding and engagement it helped me a lot later one. Sadly I only had that teacher for 2 years.

shkkmo1y ago

> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.

Extrapolating a "best way" from a couple of examples of success is bad reasoning. There are definitely ways in which it can be necessary to ignore the standing wisdom to make progress. There are also definitely ways in which being ignorant of the knowledge gained by past attempts can greatly impede progress.

I would point out, that it is also possible to question and challenge the assumptions that prior approaches have made, without being ignorant of what those approaches tried.

Figuring which is which, is indeed hard. Generally, it seems like it works well to have a majority of people expanding/refining prior work and a minority people going in and starting from scratch to figure out which of the current assumptions or beliefs can be productively challenge/dropped. The precises balance point is vague, but it seems pretty clear that going to far either direction harms the rate of progress.

huijzer1y ago

Walter Isaacson said something similar about Einstein and Steve Jobs. Sometimes you need to reject commonly held assumptions to make progress. Einstein rejected the idea of ether. According to Isaacson this was probably because Einstein was working outside of university. Inside university, professors would likely have pushed Einstein to stick to the idea of ether.

eru1y ago

> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.

I don't think that's warranted.

You will find that the vast majority of lottery winners have bought lottery tickets. However that doesn't mean that buying lottery tickets is a good idea financially.

SkyBelow1y ago

Best for an individual or for society?

Consider a simplified example. There is some area of scientific research. Working within the framework gives you a 1 in 4 chance of making some minor improvement. Working outside the framework gives you a 1 in a million chance to create a great leap in knowledge.

For any single individual, the best choice is the former. The latter is a gamble that most people will lose, wasting their lives chasing crazy theories.

For society, you want a split. You need some doing the second option to have the eventual amazing discovery, but you also need to progress the current understanding further.

If we introduce a chance for the minor progress to lead to the same major advancement, it becomes a bit more simple for society to calculate the best allocation of researchers, but for any single person, the best option still remains to dedicate themselves to the small advancement.

thenoblesunfish1y ago

Ok, but you are disregarding the 1000s of things the undergrad was aware of and the fact that he worked with other researchers who were aware of the existing results enough to understand the significance of the result.

The real trick is simply to try to understand things directly and not rely on proof by authority all the time.

germandiago1y ago

It is just easier to think out of the box when you do not have your mind "polluted" with previous ideas and from time to time someone appears that was thinking just in another way, probably the most obvious to them without knowing about the orthodox thinking in the subject.

This is valuable.

kazinator1y ago

It takes time to read all the prior research. You could grow old by the time you get through it all. Likelihood of contributing to the field declines with age.

You might believe someone's proof of a conjecture and then be discouraged from delving any more into that rabbit hole.

More often than not you will be reinventing something. But that's not necessary less productive than reading other people's work. In the former case, you're at least making something, if not new.

So there are some arguments for being fresh in a well-trodden field with an ocean of research that you cannot boil all at once.

On the other hand, there is the publish-or-perish pressure in academia, which requires original research. You could just keep busy and throw enough shit agains the wall such that enough of it sticks.

xyzzy_plugh1y ago

Domain knowledge is valuable as you can wield it as opportunities arise to great effect. This lets you leap frog problems by applying known solutions. There's risk of being blind to novel approaches that require innovation.

Being capable of tackling problems from first principles is invaluable because we frequently encounter problems that are novel in some dimension, even if that dimension is the combination of dimensions. This lets you leap frog large problems by decomposition, possibly going against the grain and innovating by, hopefully, simplifying. However there is risk in falling into traps that countless others have already learned the hard way.

This may come as a surprise to some but, believe it or not, you can have both. In fact, you should.

NohatCoder1y ago

There is certainly a need for ignoring common wisdom if you want to make something new. I don't think being unaware of it is necessary as long as you are willing to go forward while being told that you are on a fool's errand.

kbenson1y ago

It's less that it's the best way to approach a problem, but that it optimizes for a different goal. Building on existing knowledge is how you find the local maxima for a problem by moving along the slop you have. Starting from scratch is how you find different slopes, which may lead to higher local maximas.

Of course, if you happen to be on a slope that leads to the global maxima, starting from scratch is far less effective. We don't really know where we are usually, so there's a trade-off.

There was a good article posted to HN years ago that covered this and used rocketry as one of the examples, but I don't recall what it was. The author was well known, IIRC.

andai1y ago

In university lectures, we'd be presented with a problem on one slide, and then like ten seconds later the solution on the next. I'd usually cover my ears and look away because I was still busy coming up with my own solution!

globular-toast1y ago

> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.

I think sometimes this is true. On the time I've had new starters on my engineering team I've always tried to teach them about the problem before they get exposed to any of our solutions. Sometimes they will have brand new insights that we've been completely blind to. It doesn't always happen, but there is only one opportunity for this, once they've seen the solutions they can't be unseen.

speleding1y ago

Somewhat surprisingly (to me), this is also found for User Interfaces [0]. The best initial design for a User Interface for a feature phone was done by designers who were not shown previous work by other designers. Iterations based on previous designs were only better if they were shown the "winning" initial design".

[0] https://www.nngroup.com/articles/parallel-design/

3abiton1y ago

This soinds like the approach deepseek CEO used for hiring. He favored young inexperienced teams so they can bring a fresh perspective and try things from new way. It paid off nicely.

immibis1y ago

Well, sometimes. Other times, perhaps even most times, you bang your head against a wall for weeks and get nowhere.

George Dantzig also solved two open problems because he thought they were homework.

bell-cot1y ago

> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before...

That depends...

- Krapivin was an undergrad, tinkering with stuff for fun. If he'd put a couple months into this, and just ended up re-inventing a few things? That'd be decently educational.

- Vs. if your team needs to ship product, on something resembling the schedule? Yeah. You definitely stick to tried-and-true algorithms.

TZubiri1y ago

When training physically, you can overtrain one muscle and depend on them. By not using those muscles on purpose you can improve your other muscles.

It is well known that limitations improve creativity.

That said I still think the best path is to learn a classical path, if you want you can question some axioms, but it's mostly irrational in that there's almost no reward for you personally, except clout, most of the reward goes to the whole science.

rollcat1y ago

In terms of practical engineering, this is also why I love to do side projects that reject existing bodies of libraries, and try to work up from first principles, and/or focus on composability rather than integration.

It's a trade-off, at first it takes longer to iterate on features, but sometimes a more minimal and/or composable tool finds its way to production. Real Systems are made of duct tape anyways.

kristopolous1y ago

I used to think this a few decades ago. I think it's just as accessible with some mix of anti-authoritarianism and defiant personality.

Essentially you learn a thing, you accept it for now and you think "well but maybe!"

Like I personally think there should be multiple mathematical zeroes but I accept it as wackiness unless I can clearly demonstrate coherency and utility as to why.

indymike1y ago

> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.

Everyone likes to focus on why you cannot do and why trying will be futile.

You don't have to disregard prior efforts. You just have to focus on one simple question:

"how can I do ______ ?"

Owlettotoo1y ago

Sometimes insight can come by evaluating the problem at its rawest form. In short, a wild but fresh perspective.

vkou1y ago

> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.

Survivorship bias, you aren't aware of all the failures where people who were unaware of prior art made all the mistakes predictable to people who were.

yodsanklai1y ago

> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.

Maybe it's just because there are more people working on these problems who don't know previous approaches than the opposite.

ijustlovemath1y ago

This is the biggest risk of AI imo; almost by definition your thoughts regress to the mean when using it

1 more reply

dabeeeenster1y ago

Interesting idea! Clifford Cocks also made a breakthrough in/invented Public Key Encryption without realising it https://en.wikipedia.org/wiki/Clifford_Cocks

temporallobe1y ago

We’re too afraid of violating some unwritten rule about reinventing the wheel. Or something.

fcq1y ago

Absolutely true! I concur 100% with your take.

Funny this breakthrough happens at same time Antirez made this post https://news.ycombinator.com/item?id=42983275

swayvil1y ago

I think it's 2 different approaches, some enjoy the one (playing with the thing itself) and some enjoy the other (playing with the various secondhand abstractions that refer to the thing).

They are different tastes. They deliver different results.

fennecbutt1y ago

That's a load of selection bias though. I'm sure there have been many, many more people who don't know anything about deck builder games who tried to make one and didn't succeed.

dumbfounder1y ago

It’s easy to think outside the box when you don’t know where the box is.

obelos1y ago

I think you're forgetting to put “all the times ignorance didn't produce a breakthrough” in the denominator.

7 more replies

dinkumthinkum1y ago

I get what you are saying but what if the amount of breakthroughs by people that did know about what came before was orders of magnitude higher than this number, would that change your mind?

bweller1y ago

See einstellung effect:

https://thedecisionlab.com/biases/einstellung-effect

agumonkey1y ago

Similarly, the fortran or algol team implemented a lot of optimization tricks on first try, things that are now considered advanced, without "knowing it".

tehjoker1y ago

You hear about this stuff because it's notable. Almost 100% of the time, if you disregard what other people have done, you are going to waste a lot of time.

UltraSane1y ago

For every case like this you have thousands of people who waste a huge amount of time and mental effort recreating something that has already been invented.

skgough1y ago

Maybe the best way to have the best of both worlds is to ensure well-established areas of research are open to “outsider art” submissions on the topic?

redcobra7621y ago

https://thedecisionlab.com/biases/availability-heuristic

You've remembered two examples of this (arguably) happening, so you attempt to draw a conclusion based on the ease with which you came up with those examples. But in reality, this method of inference is prone to error, as it doesn't consider the denominator, or how many attempts were made to achieve the results you're able to remember.

implmntatio1y ago

Yup. And we programmed all that into LeGenAIs and LeGPTs and so on ... a splendidly perfect annihilation of all things evolutionary.

tgauda1y ago

Every notable discovery has disproved something that everyone else thought was true. Naivety can be a superpower when inventing.

rnewme1y ago

I think it's more about working on a problem you spotted instead of endlessly reading, hoarding info, literature etc.

schneems1y ago

Sounds like a bit of survivorship bias. Every success from people following well known principles does not translate into a blog post or research paper. You also don’t hear about all of the people who failed because they tried something novel and it didn’t work.

I would suggest positive takeaways is to: trust but verify. If you’ve got a novel solution idea and don’t understand why others aren’t doing it that way, do both and compare. You’re guaranteed to learn something one way or another. Also: if you reinvent the wheel or do something suboptimal then that’s okay too. Sometimes the solutions don’t make sense until you see what doesn’t work. Likewise: be open to learning from others and exploring solutions outside of your predefined notion of how things should work.

3 more replies

brookst1y ago

You’re hitting on innovation versus invention. True invention is getting more and more rare. Innovation is alive and well.

resters1y ago

All scientific progress consists of leveraging some past work and overturning other past work. This is no different.

hassleblad231y ago

You have to be naive to be an innovator.

lysecret1y ago

I feel like there is already a movement, "thinking from first principles" along this direction.

moi23881y ago

No, but starting from first principles does work. And being unaware of previous work helps you do this.

hombre_fatal1y ago

The creator of Halo’s soundtrack didn’t listen to music in fear of it influencing him.

throwaway5191y ago

In the spirit of your observation, I encourage you to make your observation again.

Xcelerate1y ago

Now we just need a smart person who is somehow unaware of the halting problem.

wnolens1y ago

I had to delete Balatro last week to break an addiction. It's so so good

godelski1y ago

I actually have a hot take that is related to this (been showing up in a few of my recent comments). It is about why there's little innovation in academia, but I think it generalizes.

Major breakthroughs are those that make paradigm shifts. So, by definition, that means that something needs to be done that others are not doing. If not, things would have been solved and the status quo method would work.

Most major breakthroughs are not the result of continued progress in one direction, but rather they are made by dark horses. Often by nobodies. You literally have to say "fuck you all, I'm doing this anyways." Really this is not so much different than the founder mentality we encourage vocally yet discourage monetarily[0]. (I'm going to speak from the side of ML, because that's my research domain, but understand that this is not as bad in other fields, though I believe the phenomena still exists, just not to the same degree). Yet, it is really hard to publish anything novel. While reviewers care a lot about novelty, they actually care about something more: metrics. Not metrics in the way that you provided strong evidence for a hypothesis, but metrics in the way that you improved the state of the field.

We have 2 big reasons this environment will slow innovation and make breakthroughs rare.

1. It is very hard to do better than the current contenders on your first go. You're competing against not one player, but the accumulated work of thousands and over years or decades. You can find a flaw in that paradigm, address the specific flaw, but it is a lot of work to follow this through and mature it. Technological advancement is through the sum of s-curves, and the new thing always starts out worse. For example, think of solar panels. PVs were staggeringly expensive in the beginning and for little benefit. But now you can beat the grid pricing. New non-PV based solar is starting to make their way in and started out way worse than PV but addressed PV's theoretical limitations on power efficiency.

2. One needs to publish often. Truly novel work takes a lot of time. There's lots of pitfalls and nuances that need to be addressed. It involves A LOT of failure and from the outside (and even the inside) it is near impossible to quantify progress. It looks no different than wasting time, other than seeing that the person is doing "something." So what do people do? They pursue the things that are very likely to lead to results. By nature, these are low hanging fruit. (Well... there's also fraud... but that's a different discussion) Even if you are highly confident a research direction will be fruitful, it will often take too much time or be too costly to actually pursue (and not innovative/meaningful enough to "prototype"). So we all go in mostly the same direction.

(3. Tie in grants and funding. Your proposals need to be "promising" so you can't suggest something kinda out there. You're competing against a lot of others who are much more likely to make progress, even if the impact would be far lower)

So ironically, our fear of risk taking is making us worse at advancing. We try so hard to pick what are the right directions to go in, yet the truth is that no one has any idea and history backs this up. I'm not saying to just make it all chaotic. I think of it more like this: when exploring, you have a main party that travels in a set direction. Their strength together makes good progress, but the downside is there's less exploration. I am not saying that anyone should be able to command the ship on a whim, but rather that we need to let people be able to leave the ship if they want and to pursue their hunches or ideas. Someone thinks they saw an island off in the distance? Let them go. Even if you disagree, I do not think their efforts are fruitless and even if wrong they help map out the territory faster. But if we put all our eggs in one basket, we'll miss a lot of great opportunities. Right now, we let people off the main ship when there's an island that looks promising, and there are those that steal a lifeboat in the middle of the night. But we're all explorers and it seems like a bad idea to dissuade people who have that drive and passion in them. I know a lot of people in academia (including myself) who feel shackled by the systems, when really all they want to do is research. Not every one of these people are going to change things, in fact, likely most won't. But truth is, that's probably true if they stay on the ship too. Not to mention that it is incredibly common for these people to just leave academia all together anyways.

Research really is just a structured version of "fuck around and find out". So I think we should stop asking "why" we should pursue certain directions. "Because" is just as good of an excuse as any. In my ideal world, we'd publish anything if there is technical correctness and lack of plagiarism. Because the we usually don't know what is impactful. There are known knowns, known unknowns, and unknown unknowns. We really are trying to pretend that the unknown unknowns either don't exist, are not important, or very small. But we can't know, they're unknown unknowns, so why pretend?

[0] An example might be all the LLM based companies trying to make AGI. You want to compete? You're not going to win by making a new LLM. But one can significantly increase their odds by taking a riskier move, and fund things that are not well established. Other types of architectures. And hey, we know the LLM isn't the only way because we humans aren't LLMs. And we humans also use a lot less energy and require far less data, so even if you are fully convinced that LLMs will get us all the way, we know there are other ways to solve this problem.

emrah1y ago

Yes and we should have at least a few competing AI architectures too

77pt771y ago

This is nothing but extreme selection/survivor bias.

Shorel1y ago

No, no, no. That's the wrong thing to take away from it.

Something I got from Richard Feynman descriptions of his method of study, was to first and foremost, read the prompt of the problems, and work diligently trying to solve the problems by himself, for a reasonable amount of time.

Then, and only then, go and read the other solutions. The solutions can be the same, they can be different, and by doing all this preliminary work the researcher can truly understand the nuances of these solutions, something they can't grasp if the solutions were shown just after reading the problem.

So, the best way to approach a problem is:

- Try to solve it by yourself. Several times if necessary, give it an honest effort.

- Then, solved or not, go and read other people's solutions.

ComplexSystems1y ago

This is also apparently true for playing Go.

robblbobbl1y ago

This. I'm sorry for that guy but that is great news!

default-kramer1y ago· 10 in thread

> And for this new hash table, the time required for worst-case queries and insertions is proportional to (log x)2 — far faster than x.

> The team’s results may not lead to any immediate applications

I don't understand why it wouldn't lead to immediate applications. Is this a situation where analysis of real-world use cases allows you to tune your hash implementation better than what a purely mathematical approach would get you?

frakt0x901y ago

I haven't read the paper, but sometimes asymptotic improvements do not translate to real world improvements due to a large multiplicative factor in the complexity that gets factored out in the O() analysis. So the dataset required to see speed-up is impractically large.

2 more replies

MichaelDickens1y ago

I'm not up to date on the state of the art but I've implemented hash tables a few times and we would expand the hash table whenever it was 75% full, which means x is never greater than 4. Improving the runtime from O(x) to O((log x)^2) doesn't matter when x is so small.

I imagine there are some niche memory-constrained applications where you'd let x get larger but I never ran into them personally.

3 more replies

pclmulqdq1y ago

I'm pretty sure nobody uses uniform probing hash tables in the wild. Every time I have wanted to have very high load factors (>90%), cuckoo hashing has been good enough, and below 70-80%, linear probing is blazing fast and absolutely good enough.

2 more replies

layer81y ago

In practice the worst-case operations are avoided by reserving a little more space for the hash table. And the new results come at the cost of slower “good case” insertions.

jeffbee1y ago

Complexity analysis and actual systems programming have been diverging for a while. I don't see anything in the paper that will inform practice.

1 more reply

oulipo1y ago

Most real-world hash table implementations are not "theoretical" but depends on "real" parameters like L2 cache, assembly instruction sizes, etc

3 more replies

echoangle1y ago

Isn’t the problem that the scaling behavior only dominates with infinite n?

If you have a constant factor, that doesn’t go into the scaling rule, so having something scale (log x)2 could still be 100 times more expensive than something that scales linearly with x for all x smaller than 2^100.

1 more reply

ofirg1y ago

it improves the worse case cost given a nearly full hash map, it hurts raises the cost in other cases.

1 more reply

coulditbeused1y ago

Could it be used to optimize battery charging speed? Sounds like there's some parallel but was interested in an informed view.

elihu1y ago

Perhaps the technique requires a lot of additional metadata, so that you could fit a 50% full "normal" hash table in less memory than it takes to store a 99% full hash table using this new approach. Thus the normal hash table can always outperform the new hash table in practice despite worse big O performance because it doesn't hit the pathological worst case except in situations where the new hash table would have run out of memory.

abetusk1y ago· 5 in thread

Ok, big shout out to monort [0] for the link to the video [1].

This is just a quick overview from a single viewing of the video, but it's called "funnel hashing". The idea is to split into exponentially smaller sub arrays, so the first chunk is n/m, the second is n/(m^2), etc. until you get down to a single element. Call them A0, A1, etc., so |A0| = n/m, |A1| = n/(m^2) etc., k levels in total.

Try inserting into A0 c times. If it fails, try inserting into A1 c times. If it fails, go down the "funnel" until you find a free slot.

Call \delta the fraction of slots that are empty (I'm unclear if this is a parameter that gets set at hash table creation or one that's dynamically updated). Setting c = log(1/d) and k = log(1/d) to get worst case complexity O(log^2(1/d)).

This circumvents Yao's result by not being greedy. Yao's result holds true for greedy insertion and search policies and the above is non-greedy, as it's cascading down the funnels.

There are probably many little hairy details to work out but that's the idea, as far as I've been able to understand it. People should let me know if I'm way off base.

This very much reminds me of the "Distinct Elements in Streams" idea by Chakraborty, Vinodchandran and Meel[2].

[0] https://news.ycombinator.com/item?id=43007860

[1] https://www.youtube.com/watch?v=ArQNyOU1hyE

[2] https://arxiv.org/pdf/2301.10191

conaclos1y ago

Actually they propose two algorithms: Funnel Hashing and Elastic Hashing. Funnel Hashing is "greedy" and defeats the Yao's conjecture that concerns greedy hash mechanisms. Elastic Hashing is "non-greedy" and provides a better amortized time than greedy algorithms.

golly_ned1y ago

That it circumvents Yao’s conjecture by being non-greedy contradicts the article. Is the article wrong or is your understanding of the paper? I don’t know, just want to see if you’re noticing something the article’s authors don’t know.

1 more reply

bajsejohannes1y ago

One thing I don't understand from watching the video, is what happens in the (very rare) case that you get collisions all the way down the funnel. I assume this is related to the "One special final level to catch a few keys" (around 14:41 in the video), but given that it has to be fixed size, this can also get full. What do you do in that case?

1 more reply

akatsarakis1y ago

Quite a neat idea that could be useful for memory-constrained environments.

[Shameless plug]:

If you are into hashtables, you might want to check out Dandelion Hashtable [0]. We use it in our next-generation databases, and it was published in HPDC'24. It is currently the fastest in-memory hashtable in practice.

It improves closed addressing via bounded-cacheline chaining to surpass 1B in-memory requests/second on a commodity server.

[0] https://dandelion-datastore.com/#dlht

edflsafoiewq1y ago

Funnel hashing is greedy.

1 more reply

joe_the_user1y ago· 4 in thread

The theoretical properties of hash table always seemed so impressive to me that they bordered on magic (and this just extends them). What seemed crazy was how they could be so much better than trees, which to me were intuitively the most efficient way to store data.

What I realized is that the theory of hash tables involves a fixed-sized collection of objects. For this fixed collection, you create a hash-function and used that like a vector-index and store the collection in a (pre-allocated) vector. This gives a (fuzzy-lens'd) recipe for O(1) time insert, deletion and look-up. (The various tree structures, in contrast, don't assume a particular size).

The two problems are you have to decide size beforehand and if your vector gets close to full, you insert etc processes might bog-down. So scanning the article, it seems this is a solution to the bogging down part - it allows quick insertion to a nearly-full table. It seems interesting and clever but actually not a great practical advance. In practice, rather than worrying a clever way to fill the table, I'd assume you just increase your assumed size.

Edit: I'm posting partly to test my understanding, so feel to correct me if I'm not getting something.

hcs1y ago

Proofs of constant time operations include time taken to resize the table. This takes much more time (linear in the size of the table), on insertions when the table is resized, but that time is amortized over all the insertions already done. It still works out to constant average time if you grow the table enough each time (once it starts to get too full) so it happens with decreasing frequency.

nijave1y ago

>What seemed crazy was how they could be so much better than trees, which to me were intuitively the most efficient way to store data.

Trees (sorted) are good at finding subsets and ranges "scanning" or "searching" but hashmaps are better at "seeking" like a key-value lookup

zelphirkalt1y ago

I think this is only true in the imperative world, where mutation is used. For the functional world it is probably still trees.

vessenes1y ago

It looks to me like the idea is, as you generally describe, that you segment your table into a 2d structure (well conceptually) and proceed to fill one ‘row’ at a time until it’s about 75% full, at which point you move on to the next one.

I don’t have time to fully grok the paper, but they claim this makes insertion consistently fast (I believe this until we’re at 75% of total capacity, but maybe they have some other mode for filling when they’re at 75% in every row?). They also claim retrieval is fast, and I didn’t read enough to understand how even retrieval works, or why it is faster.

I’ll put out that there a lot of times that it would be really nice to have a nearly full hash table still, you know, work. You can’t always change the size of one during execution of a program. And, in some environments memory counts a lot. That said, I would like to see and play with an implementation — I’m not sure this is ‘worth it’ in the general case.

It is also probably cache inefficient, as are most things about hash tables, with the exception of linear probing for reading out of a fairly full one, in which case, you get to just keep pulling stuff directly out of memory to check it. So, it’s not clear to me that this is performance wise worth it. Anyway, I’d like to fully understand it, it seems like an interesting new idea.

MR4D1y ago· 4 in thread

Reading through this article is like reading a description of the Monty-Hall problem. [0]

It's as through the conclusion seems to defy common sense, yet is provable. [1]

[0] - https://priceonomics.com/the-time-everyone-corrected-the-wor...

[1] - 2nd to the last paragraph: "The fact that you can achieve a constant average query time, regardless of the hash table’s fullness, was wholly unexpected — even to the authors themselves."

darknavi1y ago

I always really enjoyed the Numb3rs lecture on Monty-Hall

https://www.youtube.com/watch?v=P9WFKmLK0dc

sdenton41y ago

guys it's 2025, let's have a throw-down fight about the monty hall problem.

1 more reply

ryao1y ago

> “Our brains are just not wired to do probability problems very well, so I’m not surprised there were mistakes,” Stanford stats professor Persi Diaconis told a reporter, years ago. “[But] the strict argument would be that the question cannot be answered without knowing the motivation of the host.”

This is wrong. Let’s label the goats A and B to simplify things (so we do not need to consider the positions of the doors). There are 3 cases:

1. You pick the right door. The other two doors have goats. The host may only choose a goat. Whether it is A or B does not matter.

2. You pick the door with goat A. The host may only choose goat B.

3. You pick the door with goat B. The host may only choose goat A.

The host’s intentions are irrelevant as far as the probability is concerned (unless the host is allowed to tell the contestant which door is correct, but I am not aware of that ever being the case). 2/3 of the time, you pick the wrong door. In each of those cases, the remaining door is correct.

The most strict argument is yet another statistics professor got basic statistics wrong.

6 more replies

default-kramer1y ago

Not this again... https://duckduckgo.com/?q=monty+hall+site%3Anews.ycombinator...

I need to write a blog post or something convincing everyone we need to stop talking about the Monty Hall problem and replace it with a new problem with all the ambiguities removed. (Unless ambiguity is the point, then Monty Hall is fine.)

2 more replies

monort1y ago· 3 in thread

Talk by the inventor: https://www.youtube.com/watch?v=ArQNyOU1hyE

kristopolous1y ago

This strikes me as something that many people probably figured out a non-rigorous version of and didn't think it was special.

It's kind of one of those resource management hacks you do when you're constrained and screwed by limitations. Splitting things up by priority is a common go-to for resource allocation. This is a spin on that.

I wonder how many other "in the trenches hacks" people have done that overturn widely accepted things the inventors didn't realize were a big deal: "well I usually have a bunch of deliveries to make and I've figured out a clever way to map out the quickest path... "

Don't get me wrong - recognizing it and then formalizing it, doing the work, publishing the paper - that's a lot of effort. I'm not taking that away.

4 more replies

abetusk1y ago

Thanks so much for this link. I remain convinced that papers are so much more understandable with an accompanying talk by the creators. I wish papers would just come with a video talk included.

2 more replies

joaohaas1y ago

Thanks for the video, def a lot better than the article.

I do find it a bit weird that this is somehow better than just over-allocating (and thus reducing the chances of key collisions, which also makes worst case 'less worse') given his approach also allocates more memory through the aux arrays.

4 more replies

isaacfrond1y ago· 3 in thread

The paper is here: https://arxiv.org/pdf/2111.12800

Curiously, Andrew Krapivin, the genious undergrad in the article, is not one of the authors.

sd91y ago

This is the actual paper: https://arxiv.org/abs/2501.02305

rurban1y ago

Did anyone see the code? I didn't find anything on gitlab nor github.

Edit: Elastic Hashing found https://github.com/MWARDUNI/ElasticHashing

Want to find out if it's only academic or also realistic, and esp. within which bounds.

jtbetz221y ago

Krapivin's work was a result of his study of the Tiny Pointers paper; his paper has already been linked in another response.

jjallen1y ago· 3 in thread

Is it just me or did the article not go in to how the improvement works, just the speed of it?

jjallen1y ago

I was very interested in the improvement. That is the crux of the subject if you ask me.

pinoy4201y ago

It didn’t really go in to either. It doesn’t have any sort of benchmarking or any best/worst case fits?

1 more reply

reportgunner1y ago

Yeah usually when article starts by explaining what is known or tells you a story they do it to fluff up the text and bait you into reading because the actual content you want is not there.

amazingamazing1y ago· 3 in thread

This is a good test because it’s recent. Let’s see if deep research can come up with this result without just copying this.

Edit: gpt4, Gemini 2 and Claude had no luck. Human driven computer science is still safe.

itishappy1y ago

Do you think the results would be different if you tried this with, say, a cohort of PhD students?

rvz1y ago

Original thought or additional intuitive improvements such as this discovery is what we should be looking for.

This can come from anywhere in the world. The best part is, it did NOT discovered from an AI program.

2 more replies

refulgentis1y ago

I'm so confused trying to imagine exactly what you prompted. What did you prompt?!

1 more reply

orlp1y ago· 2 in thread

Skimming the paper [1], the key difference they used is that their hash table insertion algorithm will probe further than the first empty slot, instead of greedily filling the first empty slot it finds. They combine this with a clever probing sequence which provably finds empty slots efficiently, even if the table is very full.

This means insertions when the hash table is less full are slower, but you avoid the worst-case scenario where you're probing for the last (few) remaining open slot(s) without any idea as to where they are.

[1]: https://arxiv.org/pdf/2501.02305

---

An interesting theoretical result but I would expect the current 'trick' of simply allocating a larger table than necessary to be the superior solution in practice. For example, Rust's hashbrown intentionally leaves 1/8th (12.5%) of the table empty, which does cost a bit more memory but makes insertions/lookups very fast with high probability.

DevelopingElk1y ago

I might be misreading their algorithm, but from my look at the paper the key improvement is a non-uniform strategy where they divide the array into buckets and focus on different buckets as they fill the table. This increases the average number of locations to be probed even when the table is emptier. They still place the item in the first empty slot they see with this strategy.

The "skipping slots" has to do with jumping ahead in the hash sequence.

SiempreViernes1y ago

But you could do some hybrid, where you do greedy fill for a while and then switch to this fancier fill once your table is approaching full (using some heuristic)?

3 more replies

dooglius1y ago· 2 in thread

It looks like the result only matters in the case where the hash table is close to full. But couldn't one just deal with this case by making the table size 10% bigger? (Or, if it is resizeable, resizing earlier)

nhumrich1y ago

Yes, which is what most real world hash tables do. They resize themselves once hash collision is too probable.

xxs1y ago

In reality 75% is the standard fill factor for linear probe that also exhibits the best locality (if the table gets too full it just allocated double (or x) the memory, and copies the existing entries). Most non-linear probe tables (e.g. cookoo) suffer due to the fact RAM is not 'random' at all.

varjag1y ago· 2 in thread

tl;dr sublinear worst case query and insertion in hash tables.

bean-weevil1y ago

Specifically, (log x)^2

pinoy4201y ago

Eli5?

2 more replies

qntty1y ago· 2 in thread

A cool result, but it seems like it should be called a computer science conjecture

frabert1y ago

I think the article's author may have seen the cs.DS classification on the arxiv and assumed it meant Data Science instead of Data Structures

1 more reply

ayhanfuat1y ago

Especially considering “data science” is nowhere near 40 years old.

2 more replies

ascorbic1y ago· 2 in thread

And they wouldn't make him first named author on the paper

yacin1y ago

it's common in math/cs theory papers to have the order of authors be alphabetical rather than in descending order of contribution.

2 more replies

acaloiar1y ago

The classiest person I ever knew placed me before him on a paper just to be nice. Not only was he responsible for getting the grants funding our research, but the heft of both the theoretical and paper authorship work. I'm no longer in R&D and/or academia, but at that point I decided to do the same for someone if I ever have the opportunity to write a noteworthy paper with someone who is my junior.

1 more reply

quantum20221y ago· 1 in thread

This is neat! I always wondered if there would be a way to 'containerize' tables like this. IE a regular table is like a bulk carrier ship, with everything stuffed into it. If you could better organize it like a container ship, you could carry much more stuff more efficiently (and offload it faster too!)

doublerabbit1y ago

Easy stuff.

Convert table row to a string, json to whatever

Apply base16 to the that variable

You've now got a base16 string of that data.

Create a hash table, setup a key value for that base16 string.

You now have a container holding the data.

All you need to do is decode the hex string and you've got base32 data.

2 more replies

throwme_1231y ago· 1 in thread

Is someone aware of a GitHub repo with an implementation of this?

sternma1y ago

Put my attempt here: https://github.com/sternma/optopenhash

sternma1y ago· 1 in thread

For anyone looking for a PoC implementation, here's python:

https://github.com/sternma/optopenhash

rurban1y ago

This is overly complicated, both variants. See https://github.com/MWARDUNI/ElasticHashing for a much simplier approach

froh1y ago· 1 in thread

(2021) for the paper itself

https://arxiv.org/abs/2111.12800

gield1y ago

That is the paper Krapivin read in ~2023 and inspired him. The actual paper with the breakthrough is from January 2025: https://arxiv.org/abs/2501.02305.

1 more reply

duskwuff1y ago· 1 in thread

Paper: https://arxiv.org/pdf/2111.12800

something981y ago

This is the correct (2025) paper.

https://arxiv.org/abs/2501.02305

DeathArrow1y ago· 1 in thread

And we are taught to not try reinventing the wheel!

drpossum1y ago

What wheel was reinvented here?

aqueueaqueue1y ago· 1 in thread

How full is your typical production hashtable?

ludston1y ago

Usually somewhere around seven.

1 more reply

victor1061y ago· 1 in thread

> The team’s results may not lead to any immediate applications

Why not?

mortarion1y ago

Because it's theoretical math and if you translate it to computer code it might actually be slower.

nickhodge1y ago· 1 in thread

I bet this guy would still fail a first round FAANG developer interview requiring a Hash Table solution to move on in the process.

"Yeah, sorry. You didn't use the right Hash Table"

rswail1y ago

Which is exactly the difference between "programming" and "computer science".

1 more reply

ziofill1y ago· 1 in thread

"it is well known that a vital ingredient of success is not knowing that what you are attempting can’t be done." — Terry Pratchett (equal rites)

yas_hmaheshwari1y ago

The fool didn't knew it was impossible. So he did it

( I don't know who said it, but if forced, I will say Albert Einstein or Mark Twain :-) )

1 more reply

percentcer1y ago· 1 in thread

"arrowlike entities"

pteraspidomorph1y ago

I read that and my mind filled in "...from outer space?"

jheriko1y ago· 1 in thread

i feel this article is missing some detail or incorrect in reporting the actual development here. either that or i am missing something myself...

hash tables are constant time on average for all insertion, lookup and deletion operations, and in some special cases, which i've seen used in practice very, very often, they have very small constant run-time just like a fixed-size array (exactly equivalent in-fact).

this came up in an interview question i had in 2009 where i got judged poorly for deriding the structure as "not something i've often needed", and i've seen it in much older code.

i'm guessing maybe there are constraints at play here, like having to support unbounded growth, and some generic use case that i've not encountered in the wild...?

yxhuvud1y ago

What you are missing is how the hash table behaves when it is almost full. If there is one empty spot left in the whole table, how do you find it when you insert a new entry?

1 more reply

trebligdivad1y ago

Anyone got a simple implementation of 'Tiny pointers'? My mind prefers code/pseudo-code first rather than the proof.

matsemann1y ago

The intro picture about pointers in a drawer immediately reminded me of a talk I saw at FUN with Algorithms 2018 called Mind the Gap that gave me an aha moment about leaving space in data structures. Cool then to try to locate it, and see that it was by the same professor in the article, Martín Farach-Colton.

Not sure if it's viewable somewhere. But the conference itself was so fun. https://sites.google.com/view/fun2018/home

I'm not an academic and got my company to sponsor a trip to this Italian island to relax on the beach and watch fun talks, heh.

ThinkBeat1y ago

Do we have some nice implementations yet? I do better reading code than math.

_1tan1y ago

Neat, started on some implementation: https://kraftwerk.social/innovation-in-hash-tables/

cb3211y ago

For a different, perhaps more practical take on small pointers in hash tables, you might find this interesting: https://probablydance.com/2018/05/28/a-new-fast-hash-table-i... with contemporaneous discussion at https://news.ycombinator.com/item?id=17176713

foota1y ago

I guess the most we could hope for here is that this leads to some other discovery down the road, either in hashtables or maybe one of the similar structures like bloom filters?

nexawave-ai1y ago

I would like to see this being applied practically. Is there a video demonstrating this or is it still too soon? Is the algorithm secret sauce or will it be open sourced?

elcritch1y ago

Anyone else think this could be used with distributed hash tables to dramatically speed up searching or building them? Maybe more exoticly to LLMs and lookup tables. A clever algorithm like this should be applicable in a lot of more specialized data structures or applications.

It's likely a DHT would greatly benefit from this sort of algorithmic reduction in time and be less susceptible to constant factor overheads (if there are any).

Canigou1y ago

I unfortunately did not study well enough to understand the paper.

Can someone explain to me how this isn't some kind of Dewey Decimal Classification (https://en.wikipedia.org/wiki/Dewey_Decimal_Classification) ?

shaganer1y ago

Read this within my half hour break and man, wow what a story. I'm not a software guy, I'm a sys and net guy. Despite not caring or knowing about hash tables, that articles a great read! Thanks for sharing!

seinecle1y ago

Anyone competent enough here to venture a guess on the speed gain to expect under various scenarios?

reportgunner1y ago

Sad that the article doesn't say what his approach actually is.

bnly1y ago

Step one: Be a genius

Step two: Try to solve hard problems

Step three: Avoid reading too much of other people's work in the area

Step four: (Maybe) Invent a brilliant new solution

But really, really don't skip step one.

lupire1y ago

The older a conjecture is, the more likely it is false.

That's why the conjecture resists proof -- there is an counterexample that people aren't seeing.

pizza1y ago

Just realized that the Mixture of Million Experts paper from last year is similar in some respects to this tiny pointers idea

hoseja1y ago

Is this just theoretically better O(n) or is there an actually faster implementation somewhere?

EternalFury1y ago

What’s the time and space complexity of the new approach?

hemant10411y ago

Interesting read!

ryao1y ago

1 more reply

travisgriggs1y ago

This is cool enough. But I find the "celebrification" style of the piece a bit off putting. Did I really need to see multiple posed shots of this young man reposing in various university settings? It's like we need our own version of La La Land to glorify the survivors of computer success to motivate more to participate.

4 more replies

pmags1y ago

Nice result!

<rhetorical> Hmm....I wonder how such research gets funded?... </rhetorical>

2 more replies

jimnotgym1y ago

Now we have faster data structures we can fill that extra time by writing less efficient code, and loading more pointless libraries. This is the march of computer science.

ChrisMarshallNY1y ago

As the villain in Scooby Doo always said:

"And I would have gotten away with it, if it hadn't been for those meddling kids!"

zombiwoof1y ago

Take that AI :)

sam0x171y ago

This is huge, when can we get a rust implementation?

2 more replies

bruce3434341y ago

Ok so what's the algorithm? Ass article

kittikitti1y ago

I read through this and I'm not sure if people have heard of dictionary trees for hash tables. Of course, quantamagazine.org has been known to sensationalize these types of things.

1 more reply

j / k navigate · click thread line to collapse

584 comments

203 comments · 53 top-level

brink1y ago· 93 in thread

Krapivin made this breakthrough by being unaware of Yao's conjecture.

The developer of Balatro made an award winning deck builder game by not being aware of existing deck builders.

aidenn01y ago

9 more replies

robotelvis1y ago

In my experience the best approach is to first try to solve the problem without having read the prior work, then read the prior work, then improve your approach based on the prior work.

If you read the prior work too early to you get locked into existing mindsets. If you never read it then you miss important things you didn’t thought of.

Even if your approach is less good than the prior work (the normal case) you gain important insights into why the state of the art approach is better by comparing it with what you came up with.

3 more replies

helloplanets1y ago

> The developer of Balatro made an award winning deck builder game by not being aware of existing deck builders.

He was aware of deck builders and was directly inspired by Luck be a Landlord, but he was not aware of just how massive the genre is.

Direct quote from the developer:

https://www.reddit.com/r/Games/comments/1bdtmlg/comment/kup7...

1 more reply

chambers1y ago

  “They’re cheering for you,” she said with a smile. 
  “But I could never have done it,” [Milo] objected, “without everyone else’s help.”
  “That may be true,” said Reason gravely, “but you had the courage to try; 
     and what you can do is often simply a matter of what you will do.”
  “That’s why,” said King Azaz, “there was one very important thing about your quest 
     that we couldn’t discuss until you returned.”
  “I remember,” said Milo eagerly. “Tell me now.”
  “It was impossible,” said the king, looking at the Mathemagician.
  “Completely impossible,” said the Mathemagician, looking at the king.
  “Do you mean … ,” said the bug, who suddenly felt a bit faint.
  “Yes, indeed,” they repeated together, “but if we’d told you then, you might not have gone … 
    and, as you’ve discovered, so many things are possible just as long as you don’t know they’re impossible.”

- The Phantom Tollbooth (1961)

rincebrain1y ago

A professor I had in college, whose first published result was from a piece of homework he turned in where he incidentally solved an open question about bound on a problem, had a curious habit.

I ended up failing and taking his course again (because I had A Lot going on in college), and thus, noticed something.

I suspect that this was not an accident, given that it always happened only once.

1 more reply

somenameforme1y ago

SideQuark1y ago

The biggest gains in theory and in practice are far more often obtained by masters of craft, giving much more weight to attacking problems from a position of knowledge.

1 more reply

dataviz10001y ago

After asking ChatGPT not to agree with me that your comment and these two different approaches to solving problems are the alike, it concluded there still might be similarities between the two.

[0] https://youtu.be/P_fHJIYENdI?feature=shared&t=1030

[1] https://chatgpt.com/share/67aa8340-e540-8004-8438-3200e0d4e8...

layer81y ago

3 more replies

cdelsolar1y ago

smj-edison1y ago

abetusk1y ago

I disagree.

A lot of progress is made when there's deeper knowledge about the problem space along with some maturity for when these cartoon descriptions are invalid.

eterevsky1y ago

genghisjahn1y ago

This reminds me of the Neal Stephenson article "Innovation Starvation" from 2011:

pjc501y ago

I believe Ramanujan did the same with various maths problems. The Cambridge undergrad course sprinkles a few unsolved problems in the practice questions just in case someone does this again.

1 more reply

voidhorse1y ago

There's a reason the phrase "beginner's luck" exists. I'm not sure the naïveté and success are causally related so much as they might be coincident.

necovek1y ago

> the authors have also learned of several other hash tables that make use of the same high-level idea in different settings [7, 9].

At least part of the result was already known, and the fact authors didn't know about it mostly goes to the large corpus of knowledge we already posses.

But the core inspiration came from looking at another recent research paper "Tiny Pointers": that is totally against your premise.

Now, it has always been somewhat true that lots of existing knowledge limits our creativity in familiar domains, but you need both to really advance science.

latexr1y ago

That is called Shoshin, or Beginner’s Mind.

https://en.wikipedia.org/wiki/Shoshin

ajross1y ago

> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.

jay_kyburz1y ago

According to RPS the quote is that he had "barley played any roguelikelike deckbuilders" not that he was not aware of them.

There are a lot of great deck builders that are not roguelike. Has he played Dominion, Magic the Gathering, Hearthstone?

awesome_dude1y ago

There's a problem in all human understanding - knowing when, and knowing when not to apply pre-existing knowledge to a problem.

taurknaut1y ago

> Krapivin made this breakthrough by being unaware of Yao's conjecture.

Hashing is a pretty unintuitive sort of computation. I'm not surprised that there are still surprises.

1 more reply

anvuong1y ago

This is confirmation/survivorship bias. You only hear about these positive cases. The vast majority just ends up rediscovering old techniques and their year-long paper/work got rejected.

delichon1y ago

Unaccompanied Sonata is a 1979 short story by Orson Scott Card that takes this to an extreme, and has haunted me since I read it in the eighties.

hans-dampf1y ago

Your exact thoughts have already been put to paper by L.P.Hammet, godfather of physical organic chemistry (exact description of chemical reactions):

chasing1y ago

> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.

Both Danny Trejo and Tim Allen spent time in prison before becoming famous. While that's interesting, I'm not sure I'm ready to believe that's the best way to become a professional actor.

Edit to be a little less snarky, apologies:

Outsiders sometimes do great work. They also sometimes:

https://www.reddit.com/r/mathmemes/comments/wq9hcl/terrence_...

RALaBarge1y ago

https://en.wikipedia.org/wiki/The_Structure_of_Scientific_Re...

4 more replies

giantg21y ago

chikere2321y ago

Last year's Advent of Code had a task that was NP complete and lacked good well known approximation algorithms. I almost gave up on it when I realised as that feels impossible

In practice the data was well behaved enough and small enough that it was very doable.

klik991y ago

I think progress needs both individual achievements who break out of preconceived notions and the communal work of improving within the notions we currently have.

youniverse1y ago

If anyone wants to watch: https://youtu.be/4vou_dXuB8M?si=Wdr7q96MFULPAEc4

Definitely something we should all keep in mind that sometimes you just have to pave your own way and hope it is great on its own merits.

ibejoeb1y ago

0x38B1y ago

"fall[ing] in the rut of thought" reminds me of this paragraph from "The Footpath Way":

Even though the author is talking about a different kind of knowledge, the image of sticks of a fan - where going down one gradually excludes the others - stuck with me.

1: https://www.gutenberg.org/ebooks/59813

namibj1y ago

Why DC? An overhead line only limited by peak voltage (arc) and thermals can carry twice the power when running DC instead of AC, assuming both measured relative to ground.

I have to say though, novel development of technology by pulling recent innovations in the fundamental/material science fields underlying the target, is very not an easy thing to do.

rincebrain1y ago

Kind of?

You get novel branches of thought, but in the limit case, you're also reinventing the universe to bake an apple pie.

The Balatro dev also, for example, has talked about how he was heavily influenced by several specific other games.

dathinab1y ago

I would say not letting your thoughts be constrained by the bias of existing approaches.

This isn't easy, at all. It requires training yourself into having a open and flexible mind in general.

Not knowing about something is more like a cheat to get there easier.

shkkmo1y ago

> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.

I would point out, that it is also possible to question and challenge the assumptions that prior approaches have made, without being ignorant of what those approaches tried.

huijzer1y ago

eru1y ago

> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.

I don't think that's warranted.

You will find that the vast majority of lottery winners have bought lottery tickets. However that doesn't mean that buying lottery tickets is a good idea financially.

SkyBelow1y ago

Best for an individual or for society?

For any single individual, the best choice is the former. The latter is a gamble that most people will lose, wasting their lives chasing crazy theories.

For society, you want a split. You need some doing the second option to have the eventual amazing discovery, but you also need to progress the current understanding further.

thenoblesunfish1y ago

The real trick is simply to try to understand things directly and not rely on proof by authority all the time.

germandiago1y ago

This is valuable.

kazinator1y ago

It takes time to read all the prior research. You could grow old by the time you get through it all. Likelihood of contributing to the field declines with age.

You might believe someone's proof of a conjecture and then be discouraged from delving any more into that rabbit hole.

More often than not you will be reinventing something. But that's not necessary less productive than reading other people's work. In the former case, you're at least making something, if not new.

So there are some arguments for being fresh in a well-trodden field with an ocean of research that you cannot boil all at once.

On the other hand, there is the publish-or-perish pressure in academia, which requires original research. You could just keep busy and throw enough shit agains the wall such that enough of it sticks.

xyzzy_plugh1y ago

This may come as a surprise to some but, believe it or not, you can have both. In fact, you should.

NohatCoder1y ago

kbenson1y ago

Of course, if you happen to be on a slope that leads to the global maxima, starting from scratch is far less effective. We don't really know where we are usually, so there's a trade-off.

There was a good article posted to HN years ago that covered this and used rocketry as one of the examples, but I don't recall what it was. The author was well known, IIRC.

andai1y ago

globular-toast1y ago

> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.

speleding1y ago

[0] https://www.nngroup.com/articles/parallel-design/

3abiton1y ago

This soinds like the approach deepseek CEO used for hiring. He favored young inexperienced teams so they can bring a fresh perspective and try things from new way. It paid off nicely.

immibis1y ago

Well, sometimes. Other times, perhaps even most times, you bang your head against a wall for weeks and get nowhere.

George Dantzig also solved two open problems because he thought they were homework.

bell-cot1y ago

> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before...

That depends...

- Krapivin was an undergrad, tinkering with stuff for fun. If he'd put a couple months into this, and just ended up re-inventing a few things? That'd be decently educational.

- Vs. if your team needs to ship product, on something resembling the schedule? Yeah. You definitely stick to tried-and-true algorithms.

TZubiri1y ago

When training physically, you can overtrain one muscle and depend on them. By not using those muscles on purpose you can improve your other muscles.

It is well known that limitations improve creativity.

rollcat1y ago

It's a trade-off, at first it takes longer to iterate on features, but sometimes a more minimal and/or composable tool finds its way to production. Real Systems are made of duct tape anyways.

kristopolous1y ago

I used to think this a few decades ago. I think it's just as accessible with some mix of anti-authoritarianism and defiant personality.

Essentially you learn a thing, you accept it for now and you think "well but maybe!"

Like I personally think there should be multiple mathematical zeroes but I accept it as wackiness unless I can clearly demonstrate coherency and utility as to why.

indymike1y ago

> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.

Everyone likes to focus on why you cannot do and why trying will be futile.

You don't have to disregard prior efforts. You just have to focus on one simple question:

"how can I do ______ ?"

Owlettotoo1y ago

Sometimes insight can come by evaluating the problem at its rawest form. In short, a wild but fresh perspective.

vkou1y ago

> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.

Survivorship bias, you aren't aware of all the failures where people who were unaware of prior art made all the mistakes predictable to people who were.

yodsanklai1y ago

> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.

Maybe it's just because there are more people working on these problems who don't know previous approaches than the opposite.

ijustlovemath1y ago

This is the biggest risk of AI imo; almost by definition your thoughts regress to the mean when using it

1 more reply

dabeeeenster1y ago

Interesting idea! Clifford Cocks also made a breakthrough in/invented Public Key Encryption without realising it https://en.wikipedia.org/wiki/Clifford_Cocks

temporallobe1y ago

We’re too afraid of violating some unwritten rule about reinventing the wheel. Or something.

fcq1y ago

Absolutely true! I concur 100% with your take.

Funny this breakthrough happens at same time Antirez made this post https://news.ycombinator.com/item?id=42983275

swayvil1y ago

I think it's 2 different approaches, some enjoy the one (playing with the thing itself) and some enjoy the other (playing with the various secondhand abstractions that refer to the thing).

They are different tastes. They deliver different results.

fennecbutt1y ago

That's a load of selection bias though. I'm sure there have been many, many more people who don't know anything about deck builder games who tried to make one and didn't succeed.

dumbfounder1y ago

It’s easy to think outside the box when you don’t know where the box is.

obelos1y ago

I think you're forgetting to put “all the times ignorance didn't produce a breakthrough” in the denominator.

7 more replies

dinkumthinkum1y ago

I get what you are saying but what if the amount of breakthroughs by people that did know about what came before was orders of magnitude higher than this number, would that change your mind?

bweller1y ago

See einstellung effect:

https://thedecisionlab.com/biases/einstellung-effect

agumonkey1y ago

Similarly, the fortran or algol team implemented a lot of optimization tricks on first try, things that are now considered advanced, without "knowing it".

tehjoker1y ago

You hear about this stuff because it's notable. Almost 100% of the time, if you disregard what other people have done, you are going to waste a lot of time.

UltraSane1y ago

For every case like this you have thousands of people who waste a huge amount of time and mental effort recreating something that has already been invented.

skgough1y ago

Maybe the best way to have the best of both worlds is to ensure well-established areas of research are open to “outsider art” submissions on the topic?

redcobra7621y ago

https://thedecisionlab.com/biases/availability-heuristic

implmntatio1y ago

Yup. And we programmed all that into LeGenAIs and LeGPTs and so on ... a splendidly perfect annihilation of all things evolutionary.

tgauda1y ago

Every notable discovery has disproved something that everyone else thought was true. Naivety can be a superpower when inventing.

rnewme1y ago

I think it's more about working on a problem you spotted instead of endlessly reading, hoarding info, literature etc.

schneems1y ago

3 more replies

brookst1y ago

You’re hitting on innovation versus invention. True invention is getting more and more rare. Innovation is alive and well.

resters1y ago

All scientific progress consists of leveraging some past work and overturning other past work. This is no different.

hassleblad231y ago

You have to be naive to be an innovator.

lysecret1y ago

I feel like there is already a movement, "thinking from first principles" along this direction.

moi23881y ago

No, but starting from first principles does work. And being unaware of previous work helps you do this.

hombre_fatal1y ago

The creator of Halo’s soundtrack didn’t listen to music in fear of it influencing him.

throwaway5191y ago

In the spirit of your observation, I encourage you to make your observation again.

Xcelerate1y ago

Now we just need a smart person who is somehow unaware of the halting problem.

wnolens1y ago

I had to delete Balatro last week to break an addiction. It's so so good

godelski1y ago

I actually have a hot take that is related to this (been showing up in a few of my recent comments). It is about why there's little innovation in academia, but I think it generalizes.

We have 2 big reasons this environment will slow innovation and make breakthroughs rare.

emrah1y ago

Yes and we should have at least a few competing AI architectures too

77pt771y ago

This is nothing but extreme selection/survivor bias.

Shorel1y ago

No, no, no. That's the wrong thing to take away from it.

So, the best way to approach a problem is:

- Try to solve it by yourself. Several times if necessary, give it an honest effort.

- Then, solved or not, go and read other people's solutions.

ComplexSystems1y ago

This is also apparently true for playing Go.

robblbobbl1y ago

This. I'm sorry for that guy but that is great news!

default-kramer1y ago· 10 in thread

> And for this new hash table, the time required for worst-case queries and insertions is proportional to (log x)2 — far faster than x.

> The team’s results may not lead to any immediate applications

frakt0x901y ago

2 more replies

MichaelDickens1y ago

I imagine there are some niche memory-constrained applications where you'd let x get larger but I never ran into them personally.

3 more replies

pclmulqdq1y ago

2 more replies

layer81y ago

In practice the worst-case operations are avoided by reserving a little more space for the hash table. And the new results come at the cost of slower “good case” insertions.

jeffbee1y ago

Complexity analysis and actual systems programming have been diverging for a while. I don't see anything in the paper that will inform practice.

1 more reply

oulipo1y ago

Most real-world hash table implementations are not "theoretical" but depends on "real" parameters like L2 cache, assembly instruction sizes, etc

3 more replies

echoangle1y ago

Isn’t the problem that the scaling behavior only dominates with infinite n?

1 more reply

ofirg1y ago

it improves the worse case cost given a nearly full hash map, it hurts raises the cost in other cases.

1 more reply

coulditbeused1y ago

Could it be used to optimize battery charging speed? Sounds like there's some parallel but was interested in an informed view.

elihu1y ago

abetusk1y ago· 5 in thread

Ok, big shout out to monort [0] for the link to the video [1].

Try inserting into A0 c times. If it fails, try inserting into A1 c times. If it fails, go down the "funnel" until you find a free slot.

This circumvents Yao's result by not being greedy. Yao's result holds true for greedy insertion and search policies and the above is non-greedy, as it's cascading down the funnels.

There are probably many little hairy details to work out but that's the idea, as far as I've been able to understand it. People should let me know if I'm way off base.

This very much reminds me of the "Distinct Elements in Streams" idea by Chakraborty, Vinodchandran and Meel[2].

[0] https://news.ycombinator.com/item?id=43007860

[1] https://www.youtube.com/watch?v=ArQNyOU1hyE

[2] https://arxiv.org/pdf/2301.10191

conaclos1y ago

golly_ned1y ago

1 more reply

bajsejohannes1y ago

1 more reply

akatsarakis1y ago

Quite a neat idea that could be useful for memory-constrained environments.

[Shameless plug]:

It improves closed addressing via bounded-cacheline chaining to surpass 1B in-memory requests/second on a commodity server.

[0] https://dandelion-datastore.com/#dlht

edflsafoiewq1y ago

Funnel hashing is greedy.

1 more reply

joe_the_user1y ago· 4 in thread

Edit: I'm posting partly to test my understanding, so feel to correct me if I'm not getting something.

hcs1y ago

nijave1y ago

>What seemed crazy was how they could be so much better than trees, which to me were intuitively the most efficient way to store data.

Trees (sorted) are good at finding subsets and ranges "scanning" or "searching" but hashmaps are better at "seeking" like a key-value lookup

zelphirkalt1y ago

I think this is only true in the imperative world, where mutation is used. For the functional world it is probably still trees.

vessenes1y ago

MR4D1y ago· 4 in thread

Reading through this article is like reading a description of the Monty-Hall problem. [0]

It's as through the conclusion seems to defy common sense, yet is provable. [1]

[0] - https://priceonomics.com/the-time-everyone-corrected-the-wor...

[1] - 2nd to the last paragraph: "The fact that you can achieve a constant average query time, regardless of the hash table’s fullness, was wholly unexpected — even to the authors themselves."

darknavi1y ago

I always really enjoyed the Numb3rs lecture on Monty-Hall

https://www.youtube.com/watch?v=P9WFKmLK0dc

sdenton41y ago

guys it's 2025, let's have a throw-down fight about the monty hall problem.

1 more reply

ryao1y ago

This is wrong. Let’s label the goats A and B to simplify things (so we do not need to consider the positions of the doors). There are 3 cases:

1. You pick the right door. The other two doors have goats. The host may only choose a goat. Whether it is A or B does not matter.

2. You pick the door with goat A. The host may only choose goat B.

3. You pick the door with goat B. The host may only choose goat A.

The most strict argument is yet another statistics professor got basic statistics wrong.

6 more replies

default-kramer1y ago

Not this again... https://duckduckgo.com/?q=monty+hall+site%3Anews.ycombinator...

2 more replies

monort1y ago· 3 in thread

Talk by the inventor: https://www.youtube.com/watch?v=ArQNyOU1hyE

kristopolous1y ago

This strikes me as something that many people probably figured out a non-rigorous version of and didn't think it was special.

Don't get me wrong - recognizing it and then formalizing it, doing the work, publishing the paper - that's a lot of effort. I'm not taking that away.

4 more replies

abetusk1y ago

Thanks so much for this link. I remain convinced that papers are so much more understandable with an accompanying talk by the creators. I wish papers would just come with a video talk included.

2 more replies

joaohaas1y ago

Thanks for the video, def a lot better than the article.

4 more replies

isaacfrond1y ago· 3 in thread

The paper is here: https://arxiv.org/pdf/2111.12800

Curiously, Andrew Krapivin, the genious undergrad in the article, is not one of the authors.

sd91y ago

This is the actual paper: https://arxiv.org/abs/2501.02305

rurban1y ago

Did anyone see the code? I didn't find anything on gitlab nor github.

Edit: Elastic Hashing found https://github.com/MWARDUNI/ElasticHashing

Want to find out if it's only academic or also realistic, and esp. within which bounds.

jtbetz221y ago

Krapivin's work was a result of his study of the Tiny Pointers paper; his paper has already been linked in another response.

jjallen1y ago· 3 in thread

Is it just me or did the article not go in to how the improvement works, just the speed of it?

jjallen1y ago

I was very interested in the improvement. That is the crux of the subject if you ask me.

pinoy4201y ago

It didn’t really go in to either. It doesn’t have any sort of benchmarking or any best/worst case fits?

1 more reply

reportgunner1y ago

Yeah usually when article starts by explaining what is known or tells you a story they do it to fluff up the text and bait you into reading because the actual content you want is not there.

amazingamazing1y ago· 3 in thread

This is a good test because it’s recent. Let’s see if deep research can come up with this result without just copying this.

Edit: gpt4, Gemini 2 and Claude had no luck. Human driven computer science is still safe.

itishappy1y ago

Do you think the results would be different if you tried this with, say, a cohort of PhD students?

rvz1y ago

Original thought or additional intuitive improvements such as this discovery is what we should be looking for.

This can come from anywhere in the world. The best part is, it did NOT discovered from an AI program.

2 more replies

refulgentis1y ago

I'm so confused trying to imagine exactly what you prompted. What did you prompt?!

1 more reply

orlp1y ago· 2 in thread

[1]: https://arxiv.org/pdf/2501.02305

---

DevelopingElk1y ago

The "skipping slots" has to do with jumping ahead in the hash sequence.

SiempreViernes1y ago

But you could do some hybrid, where you do greedy fill for a while and then switch to this fancier fill once your table is approaching full (using some heuristic)?

3 more replies

dooglius1y ago· 2 in thread

nhumrich1y ago

Yes, which is what most real world hash tables do. They resize themselves once hash collision is too probable.

xxs1y ago

varjag1y ago· 2 in thread

tl;dr sublinear worst case query and insertion in hash tables.

bean-weevil1y ago

Specifically, (log x)^2

pinoy4201y ago

Eli5?

2 more replies

qntty1y ago· 2 in thread

A cool result, but it seems like it should be called a computer science conjecture

frabert1y ago

I think the article's author may have seen the cs.DS classification on the arxiv and assumed it meant Data Science instead of Data Structures

1 more reply

ayhanfuat1y ago

Especially considering “data science” is nowhere near 40 years old.

2 more replies

ascorbic1y ago· 2 in thread

And they wouldn't make him first named author on the paper

yacin1y ago

it's common in math/cs theory papers to have the order of authors be alphabetical rather than in descending order of contribution.

2 more replies

acaloiar1y ago

1 more reply

quantum20221y ago· 1 in thread

doublerabbit1y ago

Easy stuff.

Convert table row to a string, json to whatever

Apply base16 to the that variable

You've now got a base16 string of that data.

Create a hash table, setup a key value for that base16 string.

You now have a container holding the data.

All you need to do is decode the hex string and you've got base32 data.

2 more replies

throwme_1231y ago· 1 in thread

Is someone aware of a GitHub repo with an implementation of this?

sternma1y ago

Put my attempt here: https://github.com/sternma/optopenhash

sternma1y ago· 1 in thread

For anyone looking for a PoC implementation, here's python:

https://github.com/sternma/optopenhash

rurban1y ago

This is overly complicated, both variants. See https://github.com/MWARDUNI/ElasticHashing for a much simplier approach

froh1y ago· 1 in thread

(2021) for the paper itself

https://arxiv.org/abs/2111.12800

gield1y ago

That is the paper Krapivin read in ~2023 and inspired him. The actual paper with the breakthrough is from January 2025: https://arxiv.org/abs/2501.02305.

1 more reply

duskwuff1y ago· 1 in thread

Paper: https://arxiv.org/pdf/2111.12800

something981y ago

This is the correct (2025) paper.

https://arxiv.org/abs/2501.02305

DeathArrow1y ago· 1 in thread

And we are taught to not try reinventing the wheel!

drpossum1y ago

What wheel was reinvented here?

aqueueaqueue1y ago· 1 in thread

How full is your typical production hashtable?

ludston1y ago

Usually somewhere around seven.

1 more reply

victor1061y ago· 1 in thread

> The team’s results may not lead to any immediate applications

Why not?

mortarion1y ago

Because it's theoretical math and if you translate it to computer code it might actually be slower.

nickhodge1y ago· 1 in thread

I bet this guy would still fail a first round FAANG developer interview requiring a Hash Table solution to move on in the process.

"Yeah, sorry. You didn't use the right Hash Table"

rswail1y ago

Which is exactly the difference between "programming" and "computer science".

1 more reply

ziofill1y ago· 1 in thread

"it is well known that a vital ingredient of success is not knowing that what you are attempting can’t be done." — Terry Pratchett (equal rites)

yas_hmaheshwari1y ago

The fool didn't knew it was impossible. So he did it

( I don't know who said it, but if forced, I will say Albert Einstein or Mark Twain :-) )

1 more reply

percentcer1y ago· 1 in thread

"arrowlike entities"

pteraspidomorph1y ago

I read that and my mind filled in "...from outer space?"

jheriko1y ago· 1 in thread

i feel this article is missing some detail or incorrect in reporting the actual development here. either that or i am missing something myself...

this came up in an interview question i had in 2009 where i got judged poorly for deriding the structure as "not something i've often needed", and i've seen it in much older code.

i'm guessing maybe there are constraints at play here, like having to support unbounded growth, and some generic use case that i've not encountered in the wild...?

yxhuvud1y ago

What you are missing is how the hash table behaves when it is almost full. If there is one empty spot left in the whole table, how do you find it when you insert a new entry?

1 more reply

trebligdivad1y ago

Anyone got a simple implementation of 'Tiny pointers'? My mind prefers code/pseudo-code first rather than the proof.

matsemann1y ago

Not sure if it's viewable somewhere. But the conference itself was so fun. https://sites.google.com/view/fun2018/home

I'm not an academic and got my company to sponsor a trip to this Italian island to relax on the beach and watch fun talks, heh.

ThinkBeat1y ago

Do we have some nice implementations yet? I do better reading code than math.

_1tan1y ago

Neat, started on some implementation: https://kraftwerk.social/innovation-in-hash-tables/

cb3211y ago

foota1y ago

I guess the most we could hope for here is that this leads to some other discovery down the road, either in hashtables or maybe one of the similar structures like bloom filters?

nexawave-ai1y ago

I would like to see this being applied practically. Is there a video demonstrating this or is it still too soon? Is the algorithm secret sauce or will it be open sourced?

elcritch1y ago

It's likely a DHT would greatly benefit from this sort of algorithmic reduction in time and be less susceptible to constant factor overheads (if there are any).

Canigou1y ago

I unfortunately did not study well enough to understand the paper.

Can someone explain to me how this isn't some kind of Dewey Decimal Classification (https://en.wikipedia.org/wiki/Dewey_Decimal_Classification) ?

shaganer1y ago

seinecle1y ago

Anyone competent enough here to venture a guess on the speed gain to expect under various scenarios?

reportgunner1y ago

Sad that the article doesn't say what his approach actually is.

bnly1y ago

Step one: Be a genius

Step two: Try to solve hard problems

Step three: Avoid reading too much of other people's work in the area

Step four: (Maybe) Invent a brilliant new solution

But really, really don't skip step one.

lupire1y ago

The older a conjecture is, the more likely it is false.

That's why the conjecture resists proof -- there is an counterexample that people aren't seeing.

pizza1y ago

Just realized that the Mixture of Million Experts paper from last year is similar in some respects to this tiny pointers idea

hoseja1y ago

Is this just theoretically better O(n) or is there an actually faster implementation somewhere?

EternalFury1y ago

What’s the time and space complexity of the new approach?

hemant10411y ago

Interesting read!

ryao1y ago

1 more reply

travisgriggs1y ago

4 more replies

pmags1y ago

Nice result!

<rhetorical> Hmm....I wonder how such research gets funded?... </rhetorical>

2 more replies

jimnotgym1y ago

Now we have faster data structures we can fill that extra time by writing less efficient code, and loading more pointless libraries. This is the march of computer science.

ChrisMarshallNY1y ago

As the villain in Scooby Doo always said:

"And I would have gotten away with it, if it hadn't been for those meddling kids!"

zombiwoof1y ago

Take that AI :)

sam0x171y ago

This is huge, when can we get a rust implementation?

2 more replies

bruce3434341y ago

Ok so what's the algorithm? Ass article

kittikitti1y ago

I read through this and I'm not sure if people have heard of dictionary trees for hash tables. Of course, quantamagazine.org has been known to sensationalize these types of things.

1 more reply

j / k navigate · click thread line to collapse