Because 'bandwidth' and 'extra unicode spaces' are effectively irrelevant to the situation.
This is a very common psychological dilemma among engineers - we tend to think of 'sizes we can measure' and 'performance'. When often the issues are not relevant.
It would be like adding a $500 gadget to your car that hangs out the back to go 0.01 cents better fuel mileage.
Emojis are turning into a mess - every time I grab user content these days, I have to flush for weird combinations of characters.
Worse: the representation is not only different in terms of images, but some editors combine Emojis differently - resulting in different numbers of characters.
There should be 20 emojis - they should be in the BMP (not extended char set) - and that should be it.
Then, otherwise, you send SVGs. You get the added benefit of having 'whatever you want'. If there are some common rules around image sizes etc. we could be ok. SVGs are generally small thankfully. Much bigger than text, but still relatively small.