Of course you will need to check real references, it’s like talking to people and people make mistakes too. GPT makes mistakes that in many time obvious to humans, like write equations wrong, write numbers wrong, and I only need to check other sources or use a calculator to check. But I only care about the intuitive/conceptual part anyway, which GPT does well.
It's funny, I tell people that chatgpt3.5 is like talking to someone you run into at an airport, chatgpt4 is like talking to someone you run into at a library.