I wouldn't trust a general LLM trained on reddit and 4chan (literally!) to run `rm` on my machine. But I would absolutely trust a purpose-built model trained on hundreds or thousands of years of data for a specific task. Models trained for specific tasks on huge datasets can be very reliable; certainly more reliable than humans.
I wouldn't let GPT-5 drive my car, but I let FSD drive it every day. It's not perfect yet, but I definitely see a day very soon when it will be better than me or any other single human, with all the failures humans have.