I was just using French as an example. Korean, Japanese all transfer very well. As well ? Not sure about that.
As for the other post, degraded performances are highly non trivial still. Some aren't actually poor, just worse.
Even the authors admit humans would see degraded performance on counterfactuals unless given "enough time to reason and revise", something they don't try to do with GPT-4.
Think about it. Do you genuinely belief you would score as accurately on a multiplication arithmetic test taken in base 8 ?