Conway's Law and Coding Assistants (opens in new tab)

(mishajw.com)

8 pointsmishajw1y ago5 comments

5 comments

> In the coming years, we’re likely to see coding assistants like Copilot become more and more autonomous. Instead of completing your code, the new systems will write new functions given a specification. Instead of writing new functions, they will make small changes to a handful of files. Instead of making small changes, they’ll build new functionality with increasing complexity. They’ll do complex refactors, then start codebases from scratch, then manage projects from scratch.

I see literally 0 evidence of this sort of trend.

Copilot is still BY FAR the best AI coding tool, and it’s just ok and hasn’t improved much since release IMO.

We don’t have infinite code to train models on, we’ve definitely trained GPT on about everything in GitHub and probably gitlab.

I’d be very surprised to see this level of code assistants with our current AI methods.

fragmede1y ago

Is it? there's also Cursor.sh/Mentat/Aider/rift/Continue.dev

Aider ranks the LLM engines, with claude-3.5-sonnet coming out on top, but doesn't test against Copilot

https://aider.chat/docs/leaderboards/

dartos1y ago

Yes there are other tools, but anecdotally and from what I hear from my peers, copilot provides equally mid results. Copilot just has better vscode integration, making it a much nicer product to use.

I haven’t tried codestral yet tbf.

Also, LLM leaderboards are pretty useless. They’re either contrived tests or synthetic benchmarks which don’t reflect reality. We don’t have a good evaluation framework for LLMs yet.

The one you posted, for example, is having models run through 113 small specifically Python coding exercises from the exercism GitHub repo. Those are not representative of the tasks most SWEs deal with AND it’s very likely that those models we’re “contaminated” by being already trained on those exact exercises (since they are open source)

That leaderboard is little better than marketing. Which makes sense since it’s from an AI code assistant company.

EDIT: and even on this extremely minor eval, the best LLMs could do were ~78%. Not exactly what I’d call “good”

drewcoo1y ago

> But what do we get if we take Conway’s law, and flip it on its head? What happens if we design a codebase first, then base the organizational structure around it?

Given that humans have agency and codebases do not, we eventually end up with Conway's Law in effect again as humans start to shape everything around themselves.

I think the author has problems understanding causation.

Take a baseball batter. Pitch to the batter a bunch of times, leaving all the balls to rest wherever they land. If we later move all the balls farther away, we don't expect the batter to hit farther.

Yasuraka1y ago

>If we later move all the balls farther away, we don't expect the batter to hit farther.

I was about to mention the psychological barrier phenomena according to which, once a record is broken, scores of athletes move past it in short time, to say that this might actually be an example where it could have an impact if the batter is aware.

But then I realized that I never questioned this idea and was just believing something I heard at some point many years ago. Turns out, it's just another piece of "pop sci" trivia, aka misconstrued or wrong: https://www.scienceofrunning.com/2017/05/the-roger-bannister...

j / k navigate · click thread line to collapse

5 comments

dartos1y ago

I see literally 0 evidence of this sort of trend.

Copilot is still BY FAR the best AI coding tool, and it’s just ok and hasn’t improved much since release IMO.

We don’t have infinite code to train models on, we’ve definitely trained GPT on about everything in GitHub and probably gitlab.

I’d be very surprised to see this level of code assistants with our current AI methods.

fragmede1y ago

Is it? there's also Cursor.sh/Mentat/Aider/rift/Continue.dev

Aider ranks the LLM engines, with claude-3.5-sonnet coming out on top, but doesn't test against Copilot

https://aider.chat/docs/leaderboards/

dartos1y ago

Yes there are other tools, but anecdotally and from what I hear from my peers, copilot provides equally mid results. Copilot just has better vscode integration, making it a much nicer product to use.

I haven’t tried codestral yet tbf.

Also, LLM leaderboards are pretty useless. They’re either contrived tests or synthetic benchmarks which don’t reflect reality. We don’t have a good evaluation framework for LLMs yet.

That leaderboard is little better than marketing. Which makes sense since it’s from an AI code assistant company.

EDIT: and even on this extremely minor eval, the best LLMs could do were ~78%. Not exactly what I’d call “good”

drewcoo1y ago

> But what do we get if we take Conway’s law, and flip it on its head? What happens if we design a codebase first, then base the organizational structure around it?

Given that humans have agency and codebases do not, we eventually end up with Conway's Law in effect again as humans start to shape everything around themselves.

I think the author has problems understanding causation.

Take a baseball batter. Pitch to the batter a bunch of times, leaving all the balls to rest wherever they land. If we later move all the balls farther away, we don't expect the batter to hit farther.

Yasuraka1y ago

>If we later move all the balls farther away, we don't expect the batter to hit farther.

j / k navigate · click thread line to collapse