I don't think people at Microsoft are looking at Excel as a whole, like lost souls squatting in a mansion and building sand castles in the room that they live in that have no relationship to the actual building and what it needs to keep from falling down.
I'm not sure what you mean by performance cliffs. Can you give an example of where and how you would better accomplish something without Power Query? Are you talking about processing data in the range of a few hundred megabytes?
It’s not going to replace functionality of core spreadsheet-based excel for accountants, for instance, who typically won’t have a use for PowerQuery as their data is structured differently.
In a spreadsheet the data is much less structured which is where a lot of the power comes from - for instance PowerQuery doesn’t really support things like subtotals easily, or doing scratch-calculations, or building quick financial models. It is closer to a paper-ledger with calculations scribbled into the margins than a big-data database.
PowerQuery is more about ingesting lots of data and cleaning it, while finance is often about working stuff out and playing with numbers to see what happens - and playing with numbers is easier in a less-structured-loosely-typed environments.
Accumulation patterns perform abysmally even with data in the 100Ks of elements. Say you have a table of inventory movements and want instead a snapshot table of inventory at point in time. You can do an O(n^2) self-join of a table with itself to all records with a lesser date, summing all movements to derive a total quantity at that time.
If you want to use an accumulation pattern, you can sort and cast your table to a list of records and then use List.Accumulate to iterate over each list element, deriving a new field with the running total of inventory amount. If you do this, you will find that it falls right over even with 1Ks or 10Ks of records. This is because the intermediate list that you're appending to through the accumulation is itself a lazy stream. Thus, you have to use List.Buffer at each step. Even with List.Buffer at each step, this solution falls over at high 10Ks or low 100Ks of records.
Incredibly unintuitively, you can use List.Generate with an already-buffered input list to derive a new list that can then be cast back to a table, though this still struggles with 100Ks of records.
If your snapshots can be aggregates, then you can happily throw out the idea of such an accumulation pattern and just join to a date table at the appropriate grain with all movement records less than or equal to the date in that date table.
I'll note that I regularly speak with several of the people whose blogs you will inevitably come across when performance tuning Power Query. The approaches above are the current state of the art in PQ for iteration and accumulation patterns. This is not an appeal to authority or a brag. This is to highlight the difference with the Excel spreadsheet formula approach below, which even beginners can derive from first principals.
In an Excel spreadsheet, for the same challenge, you just define a new column with a special first row formula, and each subsequent cell referencing the row above. This will happily run right up to the spreadsheet row limit with no performance concerns. If you really want, you can spill over to multiple spreadsheets, which is clunky to manage, but still performs just fine, and degrades slowly. The M approaches above hit a cliff and start hanging.
Excel formulas make it trivial to reference arbitrary cells. M is a nearly-general purpose language. PQ uses M, but as a framework for writing M, it has a strong emphasis on a query/table paradigm. A table-based operation model cuts against the grain of a spreadsheet, because a spreadsheet is a collection of arbitrary cells. A tabular approach is a collection of similarly shaped records stacked one upon the other. These two paradigms have a fair amount of overlap, but are not isomorphic. There are things trivial to express in one that become difficult bordering on impossible in the other.
My language is strict and statically typed. However, after arrays (tables are arrays of records conceptually) exceed a certain length, rather than processing them in-memory as arrays, they will be offloaded to storage and processed (transparently) in a streaming fashion.
I’m surprised that this doesn’t work well in PowerQuery. I would have thought that 100K would be peanuts for it.
Mine is a SaaS however, so the user’s laptop isn’t a constraint, and I can transparently throw a million records in BigQuery or some other data warehouse and use its aggregates if needed. Although at the 100K scale you can use SQLite and it can handle that scale of data trivially on commodity laptops.
So your experience is interesting indeed.
I'll note, as I did to a sibling reply of yours, I made observations about a specific pattern that showcases performance issues in PQ/M. PQ/M easily scales beyond 100Ks of records, but not for arbitrary processing patterns.
That's not my experience. At work, the data usually isn't very large, but I have experimented on my own time with, for instance, a public covid data file that I think was several GB.
I also thought lazy semantics is a good thing, not a fundamental flaw.
Rather than debate, I would be interested enough to spend some time on a sample problem, if you could provide one, where you believe Power Query inadequate, and at the same time have an alternate solution to provide a benchmark of what is adequate.
These are two very different statements. I've happily used PQ to ingest GBs of data. Its streaming semantics are fine to great for some types of processing and introduce performance cliffs for others. There's no binary judgment to be made here. Laziness is neither a fundamental flaw not an unmitigated good.
I've already shared one specific pattern above. I can share some mocked up data if you need me to, but that might be a day or two. Also, feel free to reach out via email (in my profile).