Thanks!
The left pulmonary veins are gross anatomy, akin to knowing what an 'if' statement is. Knowing the current state of the art in determining a minimum complexity for an operation on an obscure tree implementation is deep domain knowledge used by only a few specialists.
Yes, some are indeed very relevant. In one of my previous businesses, we built a search engine where we used DAWGs (Directed Acyclic Word Graphs) [I didn't do this part]. And they worked really, really well, in fact they work well to this day.
[I'm just responding to your first clause. :-)]
There are more slides and info on that site :)
Makes me wonder which data structures have "parallel" versions besides the two mentioned.
Does anyone here know what it is like doing research in these areas? Any general advice?
Unfortunately, day to day job has almost nothing to do with these algorithms, but mainly software architecture and maintainability.
I think the majority of these algorithms are not really something you will find implementing in day to day work. And at most I can see myself using a library with one of those. Even if you do research, it will have to be very focused on data structures and algorithms to really get deep into some of these.
Still, very enjoyable.
Rolling your own data structures is pretty vital if you're working on algorithms.
That's true but if you don't know what's possible, it's likely that you don't even look for it or stumble over it randomly.
So you should know about them, even if you never implement them. You wouldn't need to know about them in that detail for that but it's just plain fun to learn stuff like this. :-)
Probabilistic data structures like bloom filters and sketches are really useful for gathering statistics on data sets that are too large to manage or would have a negative performance impact by having an extra database trip. So something to do with internal application diagnostics/debugging/logging is a great low-risk place to use these sorts of algorithms even when it makes sense to keep all business logic in the database.
I will also point out that when you view the problem of data structures as an implementation that solves various queries, you can start building tools that automate the implementation of complex data structures given the queries you want to ask the data structure. I fully expect to see these tools start to become available over the next decade or so, given the current progress of program synthesis I see in the academic literature.
[1] Or insert, delete, or update. But usually you already have the record at that point, and you just want to update the data structure's representation of that record.
[2] You can apply the same techniques to cache access and even vector register usage for purely in-memory data structures. Same problem, just with different sizes of disparity.
Their work has been on HN before: https://news.ycombinator.com/item?id=18314555
Interesting. I did know about program verification efforts (trying to prove that programs are correct), but had never heard of program synthesis (assuming it is somewhat different from code generation). Do you know of any examples of the kinds of areas it can be applied to?
libspatialindex lacks documentation, but worked really nicely. The rtree interface in python is much friendlier.
If you only build the tree once and do no insertions what is the benefit of an R-Tree vs KDTree?
I've had to use a concurrent priority queue once (locking a regular priority queue was too slow, dropped in a concurrent implementation), implemented kd trees a few times, and I've used half-edge once (a cousin of quadedges). Mainly doing graphics work.
Often the naive data structures work better because there's less indirection and it's easier to vectorize. In my case, depending on how much GPU budget you have available, you can also just scrap the CPU implementation and throw it in a compute shader.
But then life changes and you may find yourself in need of this knowledge. I used to laugh at graph algorithm questions on interviews because I never directly used a graph in 20 years of SWE. Then I got a job where I am on a team maintaining a graph-based API. Jokes on me, now those 'silly' algorithms are very relevant.
My guess is, even after working with graph algorithms for awhile, you still needed more than 20-30 minutes to make implementation changes and still needed to have a reference to look at while doing so.
I know for a fact graphs are quite useful. I also know complex algorithmic questions during an interview for most cases are also not useful. The fact you obtained the position and were able to adapt and still manage graph structures shows that those questions didn't gauge your fundamental abilities. Maybe you didn't hit the ground running day one, but after a little bit of learning/practice, you were fine under reasonable delivery conditions.
Otherwise I find that the built-in Clojure data structures are so good for all-round use that it's difficult to justify the time to use anything else, except in extreme cases.
I don't know Clojure, so can't compare it's data structures with those of Python, but I've read a few times that Python's built-in data structures are so useful that for many applications, it suffices to just use them as is, or in combinations (e.g. lists of dicts, dicts of lists, ...), instead defining your own custom ones using classes. More so when used in combination with other Python features like {list,dict,set} comprehensions and libraries like itertools.
And I've found that to be somewhat true in my own Python work.
I've seen them applied to some real world next gen genomic tools for example this one https://github.com/vgteam/odgi implements a multi string BWT. Technically they're in academia and not making money afaik but it's interesting stuff.
Recently used nested R-Trees for in memory geospacial querying for a game (millions of atomic updates a second which would be a pain to manage sharding as items move around the world).
For example as a tree gets too big I split it at the hot spot and run the job processing for each tree on separate threads.
Few years ago implimented a simple tree for a survey product I built. The pathing in the survey is stored in the tree and I use the tree to find loops (which are invalid) in the survey.
Most off-the-shelf data structures are poorly suited for that environment so I need to roll my own or modify existing structures. I'm no data structure expert, so this kind of stuff is really helpful.
(obviously some of this is "I learned a tangential thing while learning about X", but that's kinda the point. you may never use a rope, but you might be introduced to a new idea in the process of learning about it)
These problems arise in compilers that convert programs into static single assignment (SSA) form.
Thanks :)
https://www.researchgate.net/publication/51952511_De-amortiz...
Hilarious
Where can I find the description and discussion of 'ravel trees'?
Used nested R-Trees in a personal project recently (sharded in memory spacial db for a game) which is not something I thought I'd ever have to do.