Try it! The tools are publicly available. You might find that it's harder than you think. We are very sensitive for uncanny conversations. Analogue imitations and pitching the voice is much easier to work with.
However, my point is that none of that matters. After all, deepfakes are only going to get easier, so it is only a matter of time before it is as cheap as you describe. It is that imitating a voice have very little impact on the outcome of a phishing operation. Sure, it might not hurt, but other things affects the success of a scam. Don't rely on impersonating a voice, especially since a trivial callback completely defeats it, no matter how much resources you put into it.
Which is also why none of these recent media stories make sense. And when investigated, none of them has held up to scrutiny, precisely as expected. I have not done this myself but look out for follow up stories by respected bloggers and journalists.
Lots of people work with defending against these operations, and none of them spend any correctly identifying deepfakes for a reason. Don't believe my word, I am not in the business, but ask anyone who is if they find these details believable.