How do you know which question should be answered with 'I dont know?'. There are obvious questions which have no answer, but if only those are in the dataset, the model will answer I dont know only for unreasonable questions.
To train this effectively you would need a dataset of questions which you know the model doesn't know. But if you have that... why not answer the question and put in the dataset so that the model will know ?
That's a bit imprecise, but I think it capture the idea of why 'I don't know' answers are harder to train.