Just for instance, Apple sell an extremely expensive workstation with two very powerful graphics cards. Providing better ways to use a Mac Pro's GPUs is a good way to sell more Mac Pros, and a way to unlock more computational power as the power of CPU cores has levelled off.
Hypothesis 1: Metal aims to replicate that for iOS, while Mantle can be used in the Mac Pro (which uses ATI cards)
Hypothesis 2: Metal could wrap around Mantle on OSX and some other similar interface on iOS where Mantle is not available, for a unified Apple interface without having to write their own ATI drivers
Despite Apple's best efforts, OpenCL uptake seems to be sluggish. CUDA continues to dominate developer mindshare, by providing a far better language, API, and toolchain.
Compare the C++11 subset supported by the Metal shading language to the device language of CUDA C++. Templates are a huge feature. Ahead-of-time compilation is huge (vs shipping strings to the driver like OpenCL). It retains the basic workgroup structure of OpenCL with local and global memory, so it looks feasible to map to NVIDIA and AMD hardware. Is there really anything PowerVR specific in here? People seem to be inferring an awful lot from the name, but nothing sticks out at first glance.
The features of the shading language would make porting applications from CUDA less painful. If they went all in on XCode dev tools to make it a rival to NSight for profiling/debugging, maybe those Mac Pro GPUs wouldn't seem so neglected.
i'm afraid i have to disagree with you there. over the past 5 years, CUDA popularity has peaked and is actually starting to decline. i would cite my source for that, but i'm on my phone.
aot vs online compilation is another kettle of fish. aot isn't necessarily better, although it is a more attractive offer for developers if they don't wish to ship kernel source. regardless, OpenCL 2.0 has SPIR, an llvm-it dialect that addresses this issue.
templates are not a big deal. GPGPU silicon is not well suited for complex computation (for some values of "complex") at least. i really wish C++ language features wouldn't get into the language. it's not going to be good.
if metal can replace OpenGL, then maybe i can get behind this. the failure of longs peak really set into motion it's relative demise. the API is in serious need of work.
What's your alternative to templates for generic code, exactly? C macros? Scripted pre-processing that further screws up the already marginal tool support for debugging and profiling? Copy+paste? Templates are completely orthogonal to "complex computation" -- I just want to use a device function on different data types without run-time overhead.
On the topic of complex computation, I'm constantly surprised at the kind of features NV adds to CUDA and how well they actually work. I'm also surprised at the kinds of things people do on the hardware. If someone implements a high performance lock-free data structure on the GPU, you can't look at that and say oh, that's too complex, you shouldn't do that.
Also, until we're all working on computers that look like the PS4 with a unified global memory, there's a huge incentive to cram the awkward bits of your program onto the GPU any way you can, even if it drags a bit, because that's where the data is.
NVIDIA has really gone to some extremes. malloc in device functions. vtable support. Dynamic parallelism. Metal has none of this stuff.
<EDIT: Just now saw your other replies downthread. I'm leaving this comment because it reflects my personal experience and opinions, but don't feel like you need to repeat yourself to clarify your position re: templates, etc>
If you have access to OpenGL/CL Metal isn't going to give you much (other than perhaps a prettier API). Since most PC/Mac games are written in OpenGL/CL it will only make things slower.
The Metal API only really gives you extra perf if you're currently using SceneKit.