You're right about the TPM, I won't get the key out of it. It's a special ASIC which doesn't even have the silicon gates to give me the key.
But is the TPM doing matrix-mulitiplication at 1.3 Petaflops?
Or are you just sending the encrypted file to the TPM, getting the unencrypted file back from it, which I can intercept, be it on SPI or by gaining higher privileges on the core itself? Just like with this app but down lower?
Whatever core executes the multiplications will be vulnerable by some way or the other, for an motivated attacker which has the proper resources. This is true for every hardware device, but the attack vector of someone jailbreaking a Nintendo Switch by using a electron microscope and a ion-beam miller is neglectable.
If you are that paranoid about AI models being stolen, they are worth it, so some attacker will have enough motivation to power through.
Stealing the private key out of a GPU which allows you steal a lot of valuable AI models is break-once-break-everywhere.
Apple trusted enclave is also just a TPM with other branding, or maybe a HSM dunno.