MIDI is absolutely horrible for ML. It lacks very necessary information such as articulation etc which are important to make sense of music. It's popular because it's simple but there is no way to understand music by just looking at MIDI.
I'm a hobbyist in this space (am a composer myself as well a software engineer) and currently all tools are very poor. MusicXML is better than MIDI. MEI [1] is better than MusicXML etc.
The problem is there is miniscule amount of effort and money spent into this field because music overall makes peanuts. It really doesn't justify training expensive ML algorithms which is unfortunate.
[1] https://music-encoding.org/about/