They absolutely are. It's trivial to test and verify that you can tell one to act as a suitably small Turing machine and give it instructions to use to manipulate the conversation as "the tape".
Anything else would be absolutely astounding given how simple it is to implement a minimal 2-state 3-symbol Turing machine.
> Assuming the model can even follow your instructions the output is probabilistic so in the limit you can guarantee failure.
The output is deterministic if you set the temperature to zero, at which point it is absolutely trivial to verify the correct output for each of the possible states of a minimal Turing machine.