story
- An exabyte-scale storage engine. Nothing too exotic here technically and a few companies have built them, but the design needs to address continuous data corruption, continuous hardware failure, geo-federation, etc.
- A real-time database kernel that supports very high throughput for mixed workloads. A production kernel of this type doesn't currently exist though several people working in closed source databases understand the necessary computer science in principle; the academic literature is far behind the state-of-the-art. The ability to gracefully shift load and transients between servers under full load with many millions of writes per second is not trivial.
- Native discrete topology operators. Necessary for geospatial analytics, sensor coverages, etc. If you can do it natively in the database kernel, it makes the second requirement easier to achieve since you don't need secondary indexing generally.
Any solution even halfway toward the general solution would be viable. The value possible if you have such a system is hard to overestimate. Companies have paid half a million dollars for the output of a single analytic query on tens of trillions of IoT records; the differentiator was that it was possible to execute such a query at all.
It is extremely high-end and polymathic computer science, but serious valuable if you can make a credible dent in it. And unlike some advanced topics in computer science, there are no epic unsolved theoretical problems you have to solve, though some relevant computer science may be unpublished.