Very little coverage of Mistral and other open weight models.
So this extends to barked under-contextualized commands like at the end with "Leaps forward then stands straight" but also these looser seemingly nonsense statements like "native motions" or w/e.
The big tradeoff here is that if it seems overly permissive. It would be very annoying to be talking of a third person and have your robot start dancing due to identity parsing issues.
Lots of amazing research is being spearheaded by English-as-a-second-language learners these days, I don’t think it detracts from the idea in the minds of the target audience.
Some day soon, I wouldn’t be surprised if most applied AI research might happen in Mandarin the way most fundamental physics research once happened in German. I’ll have the opposite problem then. If I show ESL speakers some kindness now, maybe they’ll show the same respect when I try to write papers in “broken” Chinese someday :)
If I was writing a paper about AI comprehension of Italian and I was not confident at speaking Italian I would definitely want to ask an Italian speaker to check my examples for me.
So far from the benchmarks comparing with other methods, this seems to be quite natural, if this can be extrapolated into game development, it would remove so much work.
If you want people to try and dodge, I guess make a component I'd add to some bones with a spherical trigger on their shoulders and pelvis and have them use boid/flocking style evasion, leave it to the physics solvers to try and recover from there. Throw some crowds into one another, keep cooking escalations until it the stew looks sufficiently not abominable.
"Person is walking normally in a circle." turns into "Human is walking usually in a loop." But at best that's ungrammatical. At worst, it sounds like "usually" might modify "in a loop": that is, someone is spending most of their time walking in a loop, but some of their time walking in some other pattern.
"A human walks a quarter of a circle" turns into "A native motions a quarter of a loop". But "motions" as a verb can only refer to gesturing. I would expect to see someone waving their arm in a quarter circle.
But it probably doesn't matter. It sounds like the model's understanding of grammar (or at least its robustness to unusual sentence structures) is too weak for those nuances to even be relevant.
Reading a few comments below, what are these fine wireframe guys/gals good for? Lots; including they can be fed into a controlnet as poses for image generation. Stability of the rendered frames is an ongoing, rapidly improving, area of research. But, these outputs look really nice, and would fit nicely into a lot of text -> animation workflows.
Can it be run in comfyui?