this is the part im talking about. i also think LLMs are very capable at detecting different types and levels or grammar, but they cant decide which ones should be filtered out to meet a certain goal. they need detailed instructions for that, and thats somewhat inefficient and causes issues like this right here.
we have done this song and dance many times with AI. its the bitter lesson: you need a system that learns these things, you cant just give it a rule based engine to patch the things it cant learn. that works in the short term, but leads to a dead end. we need something that has the "common sense" to see when grammar is fine versus hindering communication, and this just isnt there yet. so it needs to be given detailed instructions to do so. which may or may not be sustainable