Neither (current) static analysis nor testing is sufficient to score the commit on complexity.
As a trivial example (i.e. probably not something the LLM would do), if the code was a series of 25 if-then-else statements when it could have been a lookup table, no tool is going to flag that.
Now imagine what patterns, which non-junior devs would reject without even thinking, that an LLM would inject. No test, nor any static analysis tool, is going to flag that non-essential complexity.