I'm just saying that we don't know how much effort was put into making this and we don't know whether it works.
The existence of a repository containing hundereds of files, thousands of SLOCs and a folder full of tests tells us less today than it used to.
There's one thing in particular that I find quite astonishing sometimes. I don't know about this particular project, but some people use LLMs to generate both the implementation and the test cases.
What does that mean? The test cases are supposed to be the formal specification of our requirements. If we do not specify formally what we expect a tool to do, how do we know whether the tool has done what we expected, including in edge cases?