The key purpose for the reinforcement learning part (RLHF) is so that it would be socially acceptable to provide the model to the general public without getting into a PR nightmare like Microsoft Tay; that post-training does not make it "deceptively smart", it trades off a bit of smartness in order to ensure some alignment with certain restrictions. It decreases the performance on some tasks and e.g. the GPT-4 paper, which is very light on other details, provides some experimental evidence that this post-training significantly hurts the model's confidence calibration, which decreases its usability for tasks where you want to know how certain the model is.