It uses context to figure out we're trying to convert something to something else. Then it adds all those numbers up. Taking helium into consideration is no doubt interesting, but they've also polished that task since that was the common critique they got so very wrong with the first release (which I mentioned they had fixed). I'm not qualified to assess this part of the answer;
> "If the balloons displace more than 100g of air when filled with helium, then they would effectively weigh less than if they were left empty. If they displace exactly 100g of air, then the balloons would have the same weight as if they were left empty."
I don't know enough to understand how much 100g of helium is and how it behaves. And it doesn't try to explain it to me, it mentions it then takes the easy route assuming it's a trick question. What does that tell you? I guess there are similar discussions around and it gives me the summary. Why doesn't it tell me how much air it displaces under what circumstances? Temperature etc, it should be easy if it's not just a simple discussion on a random forum. A conversion regex could do it.
This comment[1] has a very impressive example. But anything I'm qualified to assess has mostly been meh. If the fix is better training data does that mean it's reasoning or regurgitating? The mistakes it makes are what tells me how it works, not when it tricks me that it's correct. To me it's a very well polished search engine summary.