Not trying to say that LLM's are equivalent to humans but that the concept of reasoning is undefined.
And the fact that their performance does increase when using test-time compute is empirical evidence that they're doing something that increases their performance on tasks that we consider would require reasoning. As to what that is, we don't know.