Even when I tried to ask it for stuff like refactoring a relatively simple rust file to be more idiomatic or organized, it consistently generated code that did not compile and was unable to fix the compile errors on 5 or 6 repromptings.
For what it's worth, a lot of SWE work technically trivial -- it makes this much quicker so there's obviously some value there, but if we're comparing it to a pair programmer, I would definitely fire a dev who had this sort of extremely limited complexity ceiling.
It really feels to me (just vibes, obviously not scientific) like it is good at interpolating between things in its training set, but is not really able to do anything more than that. Presumably this will get better over time.