They could push boundaries and publish one trained on all of Microsofts internal source code. Would for me be a great demonstration that they believe the "it's fair use and not violating copyright on the training data" argument.
It's more likely they'd sue someone who used it to develop something that ate into their lunch by saying it infringed on one of their 'secret' Linux patents they sabre rattle about every now and then.