1. The passive vs active framing applies primarily to non-Pro iPads. I don't think it can be reasonably argued the $3k Vision Pro is attempting to cannibalize low end iPads. iPad Pros are very different beasts that are designed for a particular type of creative output, and are very capable general purpose computing devices now.
2. There's no particular incentive to cannibalize iPad over Mac, its hard to fathom what dimension that makes sense on. They're happy to cannibalize any product, but their main MO is 'look what cool shit we can do with the latest tech' and Vision Pro is today's version of that. Apple is demonstrably way more interested in building out iOS/iPadOS and we've seen that over the past 10 years, to the point Mac users had to basically revolt to get attention.
3. Of course Vision Pro is based on iPad ecosystem, as its a far more modern set of conventions better suited to mixed modal input than Mac. That isn't a reflection of VP's purpose, but of engineering and design realities - it's far easier to grow iPad/iOS into a top tier spatial computing OS and ecosystem.
While MacOS is wonderful, it's really a product of a particular conception of computing, and shoehorning a 40 year old desktop OS paradigm into entirely new input modalities and ergonomic contexts makes little sense. The puck isn't going anywhere new, the desktop OS modality is baked. Apps will need to adapt to the wondrous new capabilities and constraints of the platform.
Looking at your presentation [1] it's clear the ergonomics are going to be a sticking point, which is a combined hardware & app ecosystem problem: the novelty of windows everywhere bumps up against the human factors / ergonomic fatigue constraints of moving one's head around around more than a few degrees (and whatever you do, don't optimize for looking up!), and app windows with tiny UI elements are going to be similarly fatiguing and unable to adapt to the promise and limitations in accuracy of gaze tracking.
There's a reason WinCE's desktop OS paradigms failed in PDAs whereas iOS succeeded - you need to reinvent the experience when you move modalities. I would argue the exact same thing will happen in AR/VR: sticking with a desktop OS paradigm is a losing proposition.
Just my 2c. Personally I really want to see a variety of offerings and possibilities in the market but I also want them to be based on sound reasoning and approach, which I go into a bit in this essay on category-defining products [2].