It's also odd that the RMSE increases when adding a nonlinear term. The in-sample RMSE should only decrease when adding additional terms and nonlinearities (that's where overfitting comes from).
He didn't add the log-transform, he replaced the linear relationship with the log-transform, which would cause RMSE to increase if there were a lot of high-skewing response values, which appears to be the case from the diagram.(log-transforming the response might help)