While I'm not willing to rule *out* the idea that they're "thinking" (nor "conscious" etc.), the obvious counter-argument here is all the records we have of humans doing thinking, where the records themselves are not doing the thinking that went into creating those records.
And I'm saying this as someone whose cached response to "it's just matrix multiplication it can't think/be conscious/be intelligent" is that, so far as we can measure all of reality, everything in the universe including ourselves can be expressed as matrix multiplication.
Falsification, not verification. What would be measurably different if the null hypothesis was wrong?