The developer of Balatro made an award winning deck builder game by not being aware of existing deck builders.
I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before. This makes me kind of sad, because the current world is so interconnected, that we rarely see such novelty with their tendency to "fall in the rut of thought" of those that came before. The internet is great, but it also homogenizes the world of thought, and that kind of sucks.
I think this is true only if there is a novel solution that is in a drastically different direction than similar efforts that came before. Most of the time when you ignore previous successful efforts, you end up resowing non-fertile ground.
If you read the prior work too early to you get locked into existing mindsets. If you never read it then you miss important things you didn’t thought of.
Even if your approach is less good than the prior work (the normal case) you gain important insights into why the state of the art approach is better by comparing it with what you came up with.
He was aware of deck builders and was directly inspired by Luck be a Landlord, but he was not aware of just how massive the genre is.
Direct quote from the developer:
> The one largest influence on Balatro was Luck Be a Landlord. I watched Northernlion play for a few videos and loved the concept of a non-fanatsy themed score attach roguelike a ton, so I modified the card game I was working on at the time into a roguelike.
> I cut myself off from the genre at that point intentionally, I wanted to make my own mistakes and explore the design space naively just because that process is so fun. I hear the comparison to Slay the Spire a lot but the truth is that I hadn't played that game or seen footage of it when I designed Balatro, not until much later.
https://www.reddit.com/r/Games/comments/1bdtmlg/comment/kup7...
“They’re cheering for you,” she said with a smile.
“But I could never have done it,” [Milo] objected, “without everyone else’s help.”
“That may be true,” said Reason gravely, “but you had the courage to try;
and what you can do is often simply a matter of what you will do.”
“That’s why,” said King Azaz, “there was one very important thing about your quest
that we couldn’t discuss until you returned.”
“I remember,” said Milo eagerly. “Tell me now.”
“It was impossible,” said the king, looking at the Mathemagician.
“Completely impossible,” said the Mathemagician, looking at the king.
“Do you mean … ,” said the bug, who suddenly felt a bit faint.
“Yes, indeed,” they repeated together, “but if we’d told you then, you might not have gone …
and, as you’ve discovered, so many things are possible just as long as you don’t know they’re impossible.”
- The Phantom Tollbooth (1961)I ended up failing and taking his course again (because I had A Lot going on in college), and thus, noticed something.
Each semester, on one of the assignments in the latter half of the class, he assigned one problem out of, perhaps, 30 in the problem set, where as written, it was actually an open problem, and then a day or two before they were due, he'd send out an "oops, my bad" revised version.
I suspect that this was not an accident, given that it always happened only once.
The argument of 'if that's such a good idea, why wouldn't somebody have just done it already?' seems to have grown exponentially with the advent of the internet. And I think it's because the visibility of competence of other's became so much more clear. For those who lived through e.g. Carmack's Golden Age you knew you were never going to be half the coder he was, at least based on the image he successfully crafted. That 'slight' at the end is not to say he wasn't a brilliant developer or even perhaps the best in the world at his peak, but rather that brilliance + image crafting creates this Gargantuan beast of infallibility and exceptionalism that just doesn't really exist in reality. I think it's from this exact phenomena that you also get the practical fetishism of expertise.
The biggest gains in theory and in practice are far more often obtained by masters of craft, giving much more weight to attacking problems from a position of knowledge.
In fact, even in this case, this progress required that the author was aware of very recent results in computer science, was thinking deeply about them, and most likely was scouring the literature for pieces to help. The “Tiny Pointers” paper is mentioned directly.
After asking ChatGPT not to agree with me that your comment and these two different approaches to solving problems are the alike, it concluded there still might be similarities between the two.
[0] https://youtu.be/P_fHJIYENdI?feature=shared&t=1030
[1] https://chatgpt.com/share/67aa8340-e540-8004-8438-3200e0d4e8...
But when we try to talk details, I asked him for example do you use minimax with alphabeta pruning and he told me like “I’m not sure if I am using minimax or what that is :(“ .. I ask him to describe what he does, he essentially describes minimax with pruning. I’ve sorta figured out that he must be doing some very intelligent version of an aspiration search. It’s really eye-opening because he doesn’t have any of this training. He’s never seen any related algorithms, he’s just figuring all this out on his own.
I'll also say I think that diversity in approaches is more important than One Right Way. Some people need to set out on their own, while others spend decades refining one technique. Both have led to extraordinary results!
Many problems are abstract and so we have to build "cartoon" models of what's going on, trying to distill the essence of the problem down to a simple narrative for what the shape of the problem space is and where the limitations are. That often works but backfires when the cartoon is wrong or some assumptions are violated about when the cartoon description works.
Results like this are pretty rare, nowadays, and I suspect this happened because the problem was niche enough or some new idea has had time to ferment that could be applied to this region. This seems like a pretty foundational result, so maybe I'm wrong about that for this case.
A lot of progress is made when there's deeper knowledge about the problem space along with some maturity for when these cartoon descriptions are invalid.
>A number of engineers are sitting together in a room, bouncing ideas off each other. Out of the discussion emerges a new concept that seems promising. Then some laptop-wielding person in the corner, having performed a quick Google search, announces that this “new” idea is, in fact, an old one—or at least vaguely similar—and has already been tried. Either it failed, or it succeeded. If it failed, then no manager who wants to keep his or her job will approve spending money trying to revive it. If it succeeded, then it’s patented and entry to the market is presumed to be unattainable, since the first people who thought of it will have “first-mover advantage” and will have created “barriers to entry.” The number of seemingly promising ideas that have been crushed in this way must number in the millions. What if that person in the corner hadn’t been able to do a Google search?
>In a world where decision-makers are so close to being omniscient, it’s easy to see risk as a quaint artefact of a primitive and dangerous past (…) Today’s belief in ineluctable certainty is the true innovation-killer of our age
Could knowing about prior research skew one's perspective and tarnish novel thought? Sure. But we don't know. Maybe we'd have an even better Balatro if the creator knew about some other deck builders. Maybe we wouldn't, we don't know. We cannot prove the counterfactual.
On the opposite extreme, there are examples of thinkers whose success stemmed from knowing much about one domain or much about many domains and integrating (Luhmann, Goethe, Feynman, Von Neumann etc.). In the general case, I think we are probably much better off promoting knowledge and study, and not ignorance and chance.
That said, I do think we should retain our willingness to play and to try things that are "out of bounds" with respect to the existing accumulated knowledge. We should live informed lives, but play and explore like unschooled children.
At least part of the result was already known, and the fact authors didn't know about it mostly goes to the large corpus of knowledge we already posses.
But the core inspiration came from looking at another recent research paper "Tiny Pointers": that is totally against your premise.
If Krapivin was a software engineer looking to implement this solution as optimization for a particular problem, he would have done so without ever thinking of making a research paper to prove it formally, but mostly relied on benchmarking to prove his implementation works better.
Now, it has always been somewhat true that lots of existing knowledge limits our creativity in familiar domains, but you need both to really advance science.
This is an "Einstein failed Math" fallacy. It's true that novel and notable work tends strongly not to be bound by existing consensus, which when you say it that way is hardly surprising. So yes, if consensus is wrong in some particular way the people most likely to see that are the ones least invested in the consensus.
But most problems aren't like that! Almost always the "best" way to solve a problem, certainly the best way you're going to solve the problem, is "however someone else already solved it". But sometimes it's not, and that's when interesting stuff happens.
There are a lot of great deck builders that are not roguelike. Has he played Dominion, Magic the Gathering, Hearthstone?
Have we been grinding away in the right direction and are only moments away from cracking the problem, or should we drop everything and try something completely new because we've obviously not found the solution in the direction we were heading.
To put it into a CS type context - Should we be using a DFS or BFS search for the solution, because we don't have knowledge of future cost (so UCS/Djikstra's is out) nor do we know where the solution lies in general (so A* is out, even if you ignore the UCS component)
I don't think there's any evidence of this. Yao's conjecture is not exactly standard undergraduate material (although it might be—this is a commentary on detail rather than difficulty. But i certainly didn't encounter this conjecture in school). If not knowing this conjecture was the key, millions and millions of students failed to see what Krapivin did. I imagine you'd have to ask him what the key to his insight is.
Hashing is a pretty unintuitive sort of computation. I'm not surprised that there are still surprises.
one might “... overlook the great difference between exact theory and approximate theory. Again, let me emphasize my great respect for approximate theory. [...] if one starts looking for an effect predicted by this kind of theory to be impossible, the odds are against a favorable outcome. Fortunately, however, the community of scientists, like that of horseplayers, contains some people who prefer to bet against the odds as well as a great many who always bet on the favorite. In science we should, I think, do all we can to encourage the man who is willing to gamble against the odds of this sort.
This does not mean that we should encourage the fool or the ignoramus who wants to play against suicidal odds, the man who wants to spend his time and usually someone else’s money looking for an effect incompatible with, let us say one of the conclusions reached by Willard Gibbs. Gibbs started from thoroughly proven generalizations, the first and second laws of thermodynamics, and reasoned from them by exact mathematical procedures, and his conclusions are the best example I know of exact theory, theory against which it is futile to struggle.”
Both Danny Trejo and Tim Allen spent time in prison before becoming famous. While that's interesting, I'm not sure I'm ready to believe that's the best way to become a professional actor.
Edit to be a little less snarky, apologies:
"Outsiders" are great for approaching problems from fresh angles, but I can almost guarantee that the majority of nuts-and-bolts progress in a field comes from people who "fall in the rut of thought" in the sense that they area aware enough of the field to know which paths might be most fruitful. If I had to place a bet on myself, I wouldn't take a wild uninformed swing: I'd get myself up to speed on things first.
Outsiders sometimes do great work. They also sometimes:
https://www.reddit.com/r/mathmemes/comments/wq9hcl/terrence_...
One example is Geocentrism vs Copernican astronomical models -- Copernican could never have sprung from the status quo because everything revolved around the Earth in Geocentrism instead of around the Sun. You can't square that circle.
https://en.wikipedia.org/wiki/The_Structure_of_Scientific_Re...
For example, as a young kid I saw a geometric ball made up of hinges that allow it to expand and contract, and in some stages it looks a little like a gear. So then I started wondering if you change gears instead of switching gears in a car. Then a decade or so later I started seeing CVT transmissions in cars, which is the same concept where you can change the size/ratio by expanding or contracting the roller instead of switching gears.
In practice the data was well behaved enough and small enough that it was very doable.
I think progress needs both individual achievements who break out of preconceived notions and the communal work of improving within the notions we currently have.
If anyone wants to watch: https://youtu.be/4vou_dXuB8M?si=Wdr7q96MFULPAEc4
Definitely something we should all keep in mind that sometimes you just have to pave your own way and hope it is great on its own merits.
> So long as man does not bother about what he is or whence he came or whither he is going, the whole thing seems as simple as the verb "to be"; and you may say that the moment he does begin thinking about what he is (which is more than thinking that he is) and whence he came and whither he is going, he gets on to a lot of roads that lead nowhere, and that spread like the fingers of a hand or the sticks of a fan; so that if he pursues two or more of them he soon gets beyond his straddle, and if he pursues only one he gets farther and farther from the rest of all knowledge as he proceeds. You may say that and it will be true. But there is one kind of knowledge a man does get when he thinks about what he is, whence he came and whither he is going, which is this: that it is the only important question he can ask himself. (The Footpath Way, Introduction (1))
Even though the author is talking about a different kind of knowledge, the image of sticks of a fan - where going down one gradually excludes the others - stuck with me.
Why DC? An overhead line only limited by peak voltage (arc) and thermals can carry twice the power when running DC instead of AC, assuming both measured relative to ground.
Also, you can run you transistors completely steady-state at all frequency components between their own switching fundamental and your load transients. No more over provisioning just to make up for legacy 50/60 Hz AC.
Also, to a degree, you can just plug raw batteries in with that be DC grid, at most having a little bit of DC regulation to force the voltage a bit higher/lower than the batteries. Like, a power supply basically rated to a couple percent of the battery input/output max power: only need to move the small extra voltage, though ofc at the full current.
Lastly, DC converters are just way smaller and lighter, so you could avoid the heavy bulky transformers in trains and alleviate power limiting from them. Relevant for fast double-decker trains because you'd prefer to have human space where you currently park the transformer.
I have to say though, novel development of technology by pulling recent innovations in the fundamental/material science fields underlying the target, is very not an easy thing to do.
You get novel branches of thought, but in the limit case, you're also reinventing the universe to bake an apple pie.
So there's something of a tradeoff between attempting to ensure people can do more than mimic existing doctrine and efficiency of getting up to speed without having to re-prove existing math and science.
The Balatro dev also, for example, has talked about how he was heavily influenced by several specific other games.
This isn't easy, at all. It requires training yourself into having a open and flexible mind in general.
Not knowing about something is more like a cheat to get there easier.
But it's supper common that innovation involves a lot of well known foundation work and just is very different in one specific aspects, and it's quite hard to know about the other foundation work but not that specific aspect especially if you don't even know which aspect can be fundamentally be "revolutionized"/"innovated".
But what always help if you learn about a new topic is to try blindly first yourself and then look at what the existing approaches do. Not just for doing ground braking work but even for e.g. just learning math.
One of the math teachers I had over the school years before university used this approach for teaching math it yielded way better independent understanding and engagement it helped me a lot later one. Sadly I only had that teacher for 2 years.
Extrapolating a "best way" from a couple of examples of success is bad reasoning. There are definitely ways in which it can be necessary to ignore the standing wisdom to make progress. There are also definitely ways in which being ignorant of the knowledge gained by past attempts can greatly impede progress.
I would point out, that it is also possible to question and challenge the assumptions that prior approaches have made, without being ignorant of what those approaches tried.
Figuring which is which, is indeed hard. Generally, it seems like it works well to have a majority of people expanding/refining prior work and a minority people going in and starting from scratch to figure out which of the current assumptions or beliefs can be productively challenge/dropped. The precises balance point is vague, but it seems pretty clear that going to far either direction harms the rate of progress.
I don't think that's warranted.
You will find that the vast majority of lottery winners have bought lottery tickets. However that doesn't mean that buying lottery tickets is a good idea financially.
Consider a simplified example. There is some area of scientific research. Working within the framework gives you a 1 in 4 chance of making some minor improvement. Working outside the framework gives you a 1 in a million chance to create a great leap in knowledge.
For any single individual, the best choice is the former. The latter is a gamble that most people will lose, wasting their lives chasing crazy theories.
For society, you want a split. You need some doing the second option to have the eventual amazing discovery, but you also need to progress the current understanding further.
If we introduce a chance for the minor progress to lead to the same major advancement, it becomes a bit more simple for society to calculate the best allocation of researchers, but for any single person, the best option still remains to dedicate themselves to the small advancement.
The real trick is simply to try to understand things directly and not rely on proof by authority all the time.
This is valuable.
You might believe someone's proof of a conjecture and then be discouraged from delving any more into that rabbit hole.
More often than not you will be reinventing something. But that's not necessary less productive than reading other people's work. In the former case, you're at least making something, if not new.
So there are some arguments for being fresh in a well-trodden field with an ocean of research that you cannot boil all at once.
On the other hand, there is the publish-or-perish pressure in academia, which requires original research. You could just keep busy and throw enough shit agains the wall such that enough of it sticks.
Being capable of tackling problems from first principles is invaluable because we frequently encounter problems that are novel in some dimension, even if that dimension is the combination of dimensions. This lets you leap frog large problems by decomposition, possibly going against the grain and innovating by, hopefully, simplifying. However there is risk in falling into traps that countless others have already learned the hard way.
This may come as a surprise to some but, believe it or not, you can have both. In fact, you should.
Of course, if you happen to be on a slope that leads to the global maxima, starting from scratch is far less effective. We don't really know where we are usually, so there's a trade-off.
There was a good article posted to HN years ago that covered this and used rocketry as one of the examples, but I don't recall what it was. The author was well known, IIRC.
I think sometimes this is true. On the time I've had new starters on my engineering team I've always tried to teach them about the problem before they get exposed to any of our solutions. Sometimes they will have brand new insights that we've been completely blind to. It doesn't always happen, but there is only one opportunity for this, once they've seen the solutions they can't be unseen.
George Dantzig also solved two open problems because he thought they were homework.
That depends...
- Krapivin was an undergrad, tinkering with stuff for fun. If he'd put a couple months into this, and just ended up re-inventing a few things? That'd be decently educational.
- Vs. if your team needs to ship product, on something resembling the schedule? Yeah. You definitely stick to tried-and-true algorithms.
It is well known that limitations improve creativity.
That said I still think the best path is to learn a classical path, if you want you can question some axioms, but it's mostly irrational in that there's almost no reward for you personally, except clout, most of the reward goes to the whole science.
It's a trade-off, at first it takes longer to iterate on features, but sometimes a more minimal and/or composable tool finds its way to production. Real Systems are made of duct tape anyways.
Essentially you learn a thing, you accept it for now and you think "well but maybe!"
Like I personally think there should be multiple mathematical zeroes but I accept it as wackiness unless I can clearly demonstrate coherency and utility as to why.
Everyone likes to focus on why you cannot do and why trying will be futile.
You don't have to disregard prior efforts. You just have to focus on one simple question:
"how can I do ______ ?"
Survivorship bias, you aren't aware of all the failures where people who were unaware of prior art made all the mistakes predictable to people who were.
Maybe it's just because there are more people working on these problems who don't know previous approaches than the opposite.
Funny this breakthrough happens at same time Antirez made this post https://news.ycombinator.com/item?id=42983275
They are different tastes. They deliver different results.
You've remembered two examples of this (arguably) happening, so you attempt to draw a conclusion based on the ease with which you came up with those examples. But in reality, this method of inference is prone to error, as it doesn't consider the denominator, or how many attempts were made to achieve the results you're able to remember.
I would suggest positive takeaways is to: trust but verify. If you’ve got a novel solution idea and don’t understand why others aren’t doing it that way, do both and compare. You’re guaranteed to learn something one way or another. Also: if you reinvent the wheel or do something suboptimal then that’s okay too. Sometimes the solutions don’t make sense until you see what doesn’t work. Likewise: be open to learning from others and exploring solutions outside of your predefined notion of how things should work.
Major breakthroughs are those that make paradigm shifts. So, by definition, that means that something needs to be done that others are not doing. If not, things would have been solved and the status quo method would work.
Most major breakthroughs are not the result of continued progress in one direction, but rather they are made by dark horses. Often by nobodies. You literally have to say "fuck you all, I'm doing this anyways." Really this is not so much different than the founder mentality we encourage vocally yet discourage monetarily[0]. (I'm going to speak from the side of ML, because that's my research domain, but understand that this is not as bad in other fields, though I believe the phenomena still exists, just not to the same degree). Yet, it is really hard to publish anything novel. While reviewers care a lot about novelty, they actually care about something more: metrics. Not metrics in the way that you provided strong evidence for a hypothesis, but metrics in the way that you improved the state of the field.
We have 2 big reasons this environment will slow innovation and make breakthroughs rare.
1. It is very hard to do better than the current contenders on your first go. You're competing against not one player, but the accumulated work of thousands and over years or decades. You can find a flaw in that paradigm, address the specific flaw, but it is a lot of work to follow this through and mature it. Technological advancement is through the sum of s-curves, and the new thing always starts out worse. For example, think of solar panels. PVs were staggeringly expensive in the beginning and for little benefit. But now you can beat the grid pricing. New non-PV based solar is starting to make their way in and started out way worse than PV but addressed PV's theoretical limitations on power efficiency.
2. One needs to publish often. Truly novel work takes a lot of time. There's lots of pitfalls and nuances that need to be addressed. It involves A LOT of failure and from the outside (and even the inside) it is near impossible to quantify progress. It looks no different than wasting time, other than seeing that the person is doing "something." So what do people do? They pursue the things that are very likely to lead to results. By nature, these are low hanging fruit. (Well... there's also fraud... but that's a different discussion) Even if you are highly confident a research direction will be fruitful, it will often take too much time or be too costly to actually pursue (and not innovative/meaningful enough to "prototype"). So we all go in mostly the same direction.
(3. Tie in grants and funding. Your proposals need to be "promising" so you can't suggest something kinda out there. You're competing against a lot of others who are much more likely to make progress, even if the impact would be far lower)
So ironically, our fear of risk taking is making us worse at advancing. We try so hard to pick what are the right directions to go in, yet the truth is that no one has any idea and history backs this up. I'm not saying to just make it all chaotic. I think of it more like this: when exploring, you have a main party that travels in a set direction. Their strength together makes good progress, but the downside is there's less exploration. I am not saying that anyone should be able to command the ship on a whim, but rather that we need to let people be able to leave the ship if they want and to pursue their hunches or ideas. Someone thinks they saw an island off in the distance? Let them go. Even if you disagree, I do not think their efforts are fruitless and even if wrong they help map out the territory faster. But if we put all our eggs in one basket, we'll miss a lot of great opportunities. Right now, we let people off the main ship when there's an island that looks promising, and there are those that steal a lifeboat in the middle of the night. But we're all explorers and it seems like a bad idea to dissuade people who have that drive and passion in them. I know a lot of people in academia (including myself) who feel shackled by the systems, when really all they want to do is research. Not every one of these people are going to change things, in fact, likely most won't. But truth is, that's probably true if they stay on the ship too. Not to mention that it is incredibly common for these people to just leave academia all together anyways.
Research really is just a structured version of "fuck around and find out". So I think we should stop asking "why" we should pursue certain directions. "Because" is just as good of an excuse as any. In my ideal world, we'd publish anything if there is technical correctness and lack of plagiarism. Because the we usually don't know what is impactful. There are known knowns, known unknowns, and unknown unknowns. We really are trying to pretend that the unknown unknowns either don't exist, are not important, or very small. But we can't know, they're unknown unknowns, so why pretend?
[0] An example might be all the LLM based companies trying to make AGI. You want to compete? You're not going to win by making a new LLM. But one can significantly increase their odds by taking a riskier move, and fund things that are not well established. Other types of architectures. And hey, we know the LLM isn't the only way because we humans aren't LLMs. And we humans also use a lot less energy and require far less data, so even if you are fully convinced that LLMs will get us all the way, we know there are other ways to solve this problem.
Something I got from Richard Feynman descriptions of his method of study, was to first and foremost, read the prompt of the problems, and work diligently trying to solve the problems by himself, for a reasonable amount of time.
Then, and only then, go and read the other solutions. The solutions can be the same, they can be different, and by doing all this preliminary work the researcher can truly understand the nuances of these solutions, something they can't grasp if the solutions were shown just after reading the problem.
So, the best way to approach a problem is:
- Try to solve it by yourself. Several times if necessary, give it an honest effort.
- Then, solved or not, go and read other people's solutions.
> The team’s results may not lead to any immediate applications
I don't understand why it wouldn't lead to immediate applications. Is this a situation where analysis of real-world use cases allows you to tune your hash implementation better than what a purely mathematical approach would get you?
I imagine there are some niche memory-constrained applications where you'd let x get larger but I never ran into them personally.
If you have a constant factor, that doesn’t go into the scaling rule, so having something scale (log x)2 could still be 100 times more expensive than something that scales linearly with x for all x smaller than 2^100.
This is just a quick overview from a single viewing of the video, but it's called "funnel hashing". The idea is to split into exponentially smaller sub arrays, so the first chunk is n/m, the second is n/(m^2), etc. until you get down to a single element. Call them A0, A1, etc., so |A0| = n/m, |A1| = n/(m^2) etc., k levels in total.
Try inserting into A0 c times. If it fails, try inserting into A1 c times. If it fails, go down the "funnel" until you find a free slot.
Call \delta the fraction of slots that are empty (I'm unclear if this is a parameter that gets set at hash table creation or one that's dynamically updated). Setting c = log(1/d) and k = log(1/d) to get worst case complexity O(log^2(1/d)).
This circumvents Yao's result by not being greedy. Yao's result holds true for greedy insertion and search policies and the above is non-greedy, as it's cascading down the funnels.
There are probably many little hairy details to work out but that's the idea, as far as I've been able to understand it. People should let me know if I'm way off base.
This very much reminds me of the "Distinct Elements in Streams" idea by Chakraborty, Vinodchandran and Meel[2].
[0] https://news.ycombinator.com/item?id=43007860
[Shameless plug]:
If you are into hashtables, you might want to check out Dandelion Hashtable [0]. We use it in our next-generation databases, and it was published in HPDC'24. It is currently the fastest in-memory hashtable in practice.
It improves closed addressing via bounded-cacheline chaining to surpass 1B in-memory requests/second on a commodity server.
What I realized is that the theory of hash tables involves a fixed-sized collection of objects. For this fixed collection, you create a hash-function and used that like a vector-index and store the collection in a (pre-allocated) vector. This gives a (fuzzy-lens'd) recipe for O(1) time insert, deletion and look-up. (The various tree structures, in contrast, don't assume a particular size).
The two problems are you have to decide size beforehand and if your vector gets close to full, you insert etc processes might bog-down. So scanning the article, it seems this is a solution to the bogging down part - it allows quick insertion to a nearly-full table. It seems interesting and clever but actually not a great practical advance. In practice, rather than worrying a clever way to fill the table, I'd assume you just increase your assumed size.
Edit: I'm posting partly to test my understanding, so feel to correct me if I'm not getting something.
Trees (sorted) are good at finding subsets and ranges "scanning" or "searching" but hashmaps are better at "seeking" like a key-value lookup
I don’t have time to fully grok the paper, but they claim this makes insertion consistently fast (I believe this until we’re at 75% of total capacity, but maybe they have some other mode for filling when they’re at 75% in every row?). They also claim retrieval is fast, and I didn’t read enough to understand how even retrieval works, or why it is faster.
I’ll put out that there a lot of times that it would be really nice to have a nearly full hash table still, you know, work. You can’t always change the size of one during execution of a program. And, in some environments memory counts a lot. That said, I would like to see and play with an implementation — I’m not sure this is ‘worth it’ in the general case.
It is also probably cache inefficient, as are most things about hash tables, with the exception of linear probing for reading out of a fairly full one, in which case, you get to just keep pulling stuff directly out of memory to check it. So, it’s not clear to me that this is performance wise worth it. Anyway, I’d like to fully understand it, it seems like an interesting new idea.
It's as through the conclusion seems to defy common sense, yet is provable. [1]
[0] - https://priceonomics.com/the-time-everyone-corrected-the-wor...
[1] - 2nd to the last paragraph: "The fact that you can achieve a constant average query time, regardless of the hash table’s fullness, was wholly unexpected — even to the authors themselves."
This is wrong. Let’s label the goats A and B to simplify things (so we do not need to consider the positions of the doors). There are 3 cases:
1. You pick the right door. The other two doors have goats. The host may only choose a goat. Whether it is A or B does not matter.
2. You pick the door with goat A. The host may only choose goat B.
3. You pick the door with goat B. The host may only choose goat A.
The host’s intentions are irrelevant as far as the probability is concerned (unless the host is allowed to tell the contestant which door is correct, but I am not aware of that ever being the case). 2/3 of the time, you pick the wrong door. In each of those cases, the remaining door is correct.
The most strict argument is yet another statistics professor got basic statistics wrong.
I need to write a blog post or something convincing everyone we need to stop talking about the Monty Hall problem and replace it with a new problem with all the ambiguities removed. (Unless ambiguity is the point, then Monty Hall is fine.)
It's kind of one of those resource management hacks you do when you're constrained and screwed by limitations. Splitting things up by priority is a common go-to for resource allocation. This is a spin on that.
I wonder how many other "in the trenches hacks" people have done that overturn widely accepted things the inventors didn't realize were a big deal: "well I usually have a bunch of deliveries to make and I've figured out a clever way to map out the quickest path... "
Don't get me wrong - recognizing it and then formalizing it, doing the work, publishing the paper - that's a lot of effort. I'm not taking that away.
I do find it a bit weird that this is somehow better than just over-allocating (and thus reducing the chances of key collisions, which also makes worst case 'less worse') given his approach also allocates more memory through the aux arrays.
Curiously, Andrew Krapivin, the genious undergrad in the article, is not one of the authors.
Edit: Elastic Hashing found https://github.com/MWARDUNI/ElasticHashing
Want to find out if it's only academic or also realistic, and esp. within which bounds.
Edit: gpt4, Gemini 2 and Claude had no luck. Human driven computer science is still safe.
This can come from anywhere in the world. The best part is, it did NOT discovered from an AI program.
This means insertions when the hash table is less full are slower, but you avoid the worst-case scenario where you're probing for the last (few) remaining open slot(s) without any idea as to where they are.
[1]: https://arxiv.org/pdf/2501.02305
---
An interesting theoretical result but I would expect the current 'trick' of simply allocating a larger table than necessary to be the superior solution in practice. For example, Rust's hashbrown intentionally leaves 1/8th (12.5%) of the table empty, which does cost a bit more memory but makes insertions/lookups very fast with high probability.
The "skipping slots" has to do with jumping ahead in the hash sequence.
Convert table row to a string, json to whatever
Apply base16 to the that variable
You've now got a base16 string of that data.
Create a hash table, setup a key value for that base16 string.
You now have a container holding the data.
All you need to do is decode the hex string and you've got base32 data.
Why not?
"Yeah, sorry. You didn't use the right Hash Table"
( I don't know who said it, but if forced, I will say Albert Einstein or Mark Twain :-) )
hash tables are constant time on average for all insertion, lookup and deletion operations, and in some special cases, which i've seen used in practice very, very often, they have very small constant run-time just like a fixed-size array (exactly equivalent in-fact).
this came up in an interview question i had in 2009 where i got judged poorly for deriding the structure as "not something i've often needed", and i've seen it in much older code.
i'm guessing maybe there are constraints at play here, like having to support unbounded growth, and some generic use case that i've not encountered in the wild...?
Not sure if it's viewable somewhere. But the conference itself was so fun. https://sites.google.com/view/fun2018/home
I'm not an academic and got my company to sponsor a trip to this Italian island to relax on the beach and watch fun talks, heh.
It's likely a DHT would greatly benefit from this sort of algorithmic reduction in time and be less susceptible to constant factor overheads (if there are any).
Can someone explain to me how this isn't some kind of Dewey Decimal Classification (https://en.wikipedia.org/wiki/Dewey_Decimal_Classification) ?
Step two: Try to solve hard problems
Step three: Avoid reading too much of other people's work in the area
Step four: (Maybe) Invent a brilliant new solution
But really, really don't skip step one.
That's why the conjecture resists proof -- there is an counterexample that people aren't seeing.
<rhetorical> Hmm....I wonder how such research gets funded?... </rhetorical>
"And I would have gotten away with it, if it hadn't been for those meddling kids!"