Fundamentally, LLM do not construct a consistent mental model of the codebase (this can be seen if you, uh, read LLM code,), and this is Bad for a lot of reasons. It's bad for long-term maintainability, it's bad for modelling this code accurately and it's behaviour as a system, it's bad for testing and verifying it, etc. Pretty much all of the tasks around program design require you to have that mental model.
You can absolutely get an LLM to show you a mental model of the code, but there is absolutely nothing that can 100% guarantee you that that's the thing it's using. Proof of this is to look at how they summarise documents, to look at how inaccurate a lot of documentation they generate is, and to look at how inaccurate a lot of their code summaries are. Those would be accurate if the LLM was forming a mental model while it worked. It's a program to statistically generate plausible text, the fact that we got the program to do more than that in the first place is very interesting and can imply a lot of things, but at the end of the day, whatever you ask for it, it will generate text. There is absolutely no guarantee around accuracy of that text and there effectively can never be.