Developing with GitHub Copilot Agent Mode and MCP (opens in new tab)

(austen.info)

93 pointsmiltonlaxer0y ago66 comments

66 comments

27 comments · 8 top-level

skydhash11mo ago· 16 in thread

And again, the most convoluted setup for development with an example that fails to demonstrate why you should adopt such practice. It’s like doing a GDB demo with an hello world program. Or doing Linux From Scratch to show how you can browse the web.

The goal of software engineering is not to write code faster. Coding is itself a translation task (and a learning workflow, as you can’t keep everything in your head). What you want is the power of decision, and better decision can be made with better information. There’s nothing in the setup that helps with making decision.

There are roughly six steps in software engineering, done sequentially and iteratively. Requirements gathering to shape the problem, Analysis to understand it, Design to come up with a solution, Coding to implement it, Testing to verify the solution, and Maintenance to keep the solution working. We have methods and tooling that help with each, giving us relevant information based on important parameters that we need to decide upon.

LLMs are example generators. Give it a prompt and it will gives the answer that fits the conversation. It’s an echo chamber powered by a lossy version of the internet. Unlike my linting tool which will show me the error when there’s one and not when I tell it to.

ADDENDUM

It's like an ivory tower filled with yes-men and mirrors that always reply "you're the fairest of them all". My mind is already prone to lie to itself. What I need most is tooling that is not influenced by what I told it, or what others believe in. My browser is not influencing my note taking tool, telling it to note down the first two results it got from google. My editor is not telling the linter to sweep that error under a virtual rug. And QA does not care that I've implemented the most advanced abstraction if the software does not fit the specs.

jcelerier11mo ago

> The goal of software engineering is not to write code faster

That just really depends on your situation. Here's a case I had just last week: we had artists in residency who suddenly showed up with a new, expensive camera that didn't have any easy to use driver but requires the use of their huge and bulky custom SDK.

Claude whipped a basic working c++ proprietary-camera-sdk-to-open-video-sharing-protocol in, what, 2 minutes? From the first go with a basic prompt? Without that it'd have been at least a couple days of development, likely a day just to go through the humongous docs -- except I had at most two hours to put on this. And I already have experience doing exactly this, having written software that involves realsenses, orbbec, leapmotion, Kinect, and all forms of weird cameras that require the use of their c++ SDK.

So the artists would just not be able to do their residency the way they wanted because they only have 3 days on-site to work too.

Or I'd have spent two days for some code that is very likely to only ever being used once, as part of this residency.

Thus in my line of work, being able to output code that works, faster than humans, is absolutely game changer - this situation I'm describing is not the exception, it's pretty much a weekly occurrence.

skydhash11mo ago

> Claude whipped a basic working c++ proprietary-camera-sdk-to-open-video-sharing-protocol in, what, 2 minutes? From the first go with a basic prompt? Without that it'd have been at least a couple days of development, likely a day just to go through the humongous docs

That's basically what I said. They are example generators. Their creators have not published the source of the data that goes in their training so we can assume that everything that is accessible from the web (and now from places that use their tools) was used.

So if you're already know the domain to provide the right keywords, and can judge the output to see if it's good enough, it's going to be fine. Especially, as you've said, it's something that you're used to do. But do you need the setup mentioned in TFA?

Most software engineering tasks involved more than getting some basic prototype working. After the 80% work done by the prototype, there's the other 80% to have reliable code. With LLMs, you're stuck with the first 80%, and that already require someone experienced to get there.

1 more reply

leetrout11mo ago

Similar anecdata:

I was writing some automated infra tests with Terraform and Terratest and I wanted to deploy to my infra. My tests are compiled into a binary and shipped to ECS Fargate as an image to run.

Instead of doing docker in docker to pull and push my images and before googling for an existing lib for managing images directly I asked Claude to write code to pull the layer tarballs from docker hub and push them to my ECR. It did so flawlessly and even knew how to correctly auth to dockerhub with their token exchange on the first try.

I glanced at the code and surmised it would have taken me an hour or two to write and test as I read the docs on the APIs.

I am sure there is a lib somewhere that does this but even that would have likely taken more time than the code gen I got.

1 more reply

alterom11mo ago

Oh, so Claude in this case was a bandaid over a communication problem (the artists not getting the memo about not suddenly showing up with new equipment that you have to support, with no prior discussion, warning, or heads-up).

It absolutely is a game changer.

Now the game for you is to deal with whatever equipment they throw at you, because nobody is going to bother consulting you in advance.

Just use AI, bro.

Good luck next time they show up with gear that Claude can't help you with. Say, because there's no API in the first place, and it's just incompatible the existing flow.

>So the artists would just not be able to do their residency the way they wanted because they only have 3 days on-site to work too.

That, to me, sounds like the good outcome for everyone involved.

It would have been their problem, which they were perfectly capable of solving by suddenly showing up with supported equipment on the job site.

Wanting you to deal with their "suddenly showing up" is not the right thing to want.

If want that, they shouldn't be able to do the residency the way they want.

Saying this as a performing musician: verifying that my gear will work at the venue before the performance is my responsibility, not the sound tech's. Ain't their job to have the right cables or power supplies. I can't fathom showing up with a setup and simply demanding to make it work.

IDK what kind of divas you work with, but what you described is a solid example of a situation when the best tool is saying "no", not using Claude.

The fact that it's a weekly occurrence is an organizational issue, not a software one.

And please — please don't use a chatbot to resolve that one either.

2 more replies

stpedgwdgfhgdd11mo ago

“The goal of software engineering is not to write code faster”

Writing proper code including tests and refactorings takes substantial time.

It is definitely worth it to do this faster, if only to get faster feedback to go back to the first phase; requirements and analysis.

I have experienced this myself, using CC it took me a few hours less to realise i was on the wrong track.

1 more reply

mtkd11mo ago

Unnecessarily critical take on a quality write-up

Much of the criticism of AI on HN feels driven by devs who have not fully ingested what is going with MCP, tools etc. right now as not looked deeper than making API calls to an LLM

troupo11mo ago

This is the crypto discussion again.

"All our critics are clueless morons who haven't realised the one true meaning of things".

Have you once considered that critics have tried these tools in all these combinations and found them lacking in more ways than one?

1 more reply

danielbln11mo ago

OP's comment also seems to be firmly stuck in 2023 when you'd prompt ChatGPT or whatever. The fact that LLMs today, when strapped into an agentic harness, can do or help with all of these things (ideation, architecture, use linters, validate code, evaluate outputs, and a million other things) seems to elude them.

1 more reply

risyachka11mo ago

>> is going with MCP, tools etc.

all these are just tools. there is nothing more to it. there is no etc.

kasey_junk11mo ago

I use llms for each of those steps and modeling agent workflows following them has been very successful for me.

I think I’ve become disgruntled with the anti-llm crowd because every objection seems to boil down to “you are doing software engineering wrong” or “you have just described a workflow that is worse than the default”.

Stop for a minute and start from a different premise. There are people out there who know how to deliver software well, have been doing it for decades and find this tooling immensely productivity enhancing. Presume they know as much as you about the industry and have been just as successful doing it.

This person took the time to very specifically outline their workflow and steps in a clear and repeatable way. Rather than trying it and giving feedback in the same specific way you just said they have no idea what they are doing.

Try imagining that they do and it’s you who are not getting the message and see if you get your a different place.

skydhash11mo ago

Criticism is not refutation. It's identifying flaws (subjectively or objectively) . I'm all for it if you can show me that those flaws don't exist or are inconsequential.

Workflows are personal and the only one who can judge them are the one who is paying for the work. At most, we can compare them in order to improve our own personal workflow.

My feedback is maybe not clear enough. But here are the main points:

- Too complicated in regards to the example provided, with the actual benefits for the complication not explained.

- Not a great methodology because the answer to the queries are tainted by the query. Like testing for alcohol by putting the liquid in a bottle of vodka. When I search for something that is not there, I expect "no results" or an error message. Not a mirage.

- The process of getting information, making decisions, and then acting is corrupted by putting it only at some irrelevant moments: Before even knowing anything; When presented with a restricted list of options with no understanding of the factors that play in the restriction; and after the work is done.

hedgehog11mo ago

I've used Copilot a bit and found it helpful for both coding and maintenance. My setup is pretty basic, and I only use it in places where the task is tedious and I am confident reviewing the diff or other output is sufficient. Things like:

"Refactor: We are replacing FlogSnarble with FloozBazzle. Review the example usage below and replace all usage across the codebase. <put an example>"

"In the browser console I see the error below. The table headers are also squished to the left while the row contents are squished to the right. Propose a fix. <pasted log and stack trace>."

"Restructure to early exit style and return an optional rather than use exceptions."

"Consolidate sliceCheese and all similar cheese-related utility functions into one file. Include doc comments noting the original location for each function."

By construction the resulting changes pass tests, come with an explainer outlining what was changed and why, and are open in tabs in VS Code for review. Meanwhile I can spend the time reading docs, dealing with house keeping tasks, and improving the design of what I'm doing. Better output, less RSI.

skydhash11mo ago

The reason I tend not to use LLMs for these taks is that they are great for thinking moments. They're so mechanical that you tend to reflect instead. Also I use Vim and Emacs which are great for that type of works (fast navigation and good editing tools) and it's not as tedious as doing in editors like VS Code and Sublime (which are not great at editing). You can even concoct something with tmux, ripgrep/fzf, and nano that is better than VS Code at this.

mvanbaak11mo ago

I fear for the coming 2 to 3 generation of software engineers. Will they be able to handle problems if the AI is not available or is the source of the problem? Only time will tell.

mtkd11mo ago

Same was said about dejanews, stackoverflow etc. and intellisense

1 more reply

xena11mo ago

But skydhash, if you don't nuke the anthill how can you be sure the ants are dead? Nuke it from orbit, it's the only way to be sure!

luckystarr11mo ago· 2 in thread

Playwright MCP is intriguing. I'll definitely give it a run today. Anybody got any tipps or gotchas?

Kostarrr11mo ago

If they didn't change it, Playwright uses the aria (accessibility) representation for their MCP agent. It strongly depends on the web page whether or not that yields good results.

We at Octomind use a mix of augmented screenshots and page representation to guide the agent. If Playwright MCP doesnt work on you page, give our MCP a try. We have a free tier.

never_inline11mo ago

Can someone elucidate how using a full blown browser is improvisation over using say markitdown / pandoc / whatever? Given that most useful coding docs sites are static (made with sphinx or mkdocs or whatever)

WhitneyLand11mo ago· 1 in thread

What’s with Copilot “agent mode” anyway, how does it compare to using Claude Code or Gemini CLI?

zihotki11mo ago

I haven't used the latter two. The Claude Code experience is very well translatable to Copilot and they are similar in features. There were not many examples of Github Copilot advanced usage like described in the article when Agent mode was just released (around 2 months ago?). But I was able to fully utilize it using Claude Code prompts and examples.

Copilot has a very direct advantage in my eyes - it has a plenty of models available: Sonnet 3.7, 4, GPT, Gemini, etc. That's something you won't get with the latter two.

jonstewart11mo ago

Just yesterday I was reading a critique of MCP that specifically mentioned the GitHub MCP server as being harder to use (from model perspective) and requiring more tokens than having the agent execute git commands directly. I am surprised to see it listed here and also surprised to see two different web search servers and the time one. I would appreciate more detail from the author about the utility of each MCP server—overloading an agent with servers seems like it could be counterproductive.

jpalomaki11mo ago

Not sure how things are with Copilot, but with Claude Code a good alternative for MCP is in some cases old fashioned command line tools.

GitHub has gh, there's open source jira-cli, Cloudflare has wrangler and so on. No configuration needed, just mention on the agent doc that this kind of tool is available. Likely it will figure out the rest.

And if you have more complicated needs, then you can combine the commands, add some jq magic, put to package.json and tell agent to use npm run to execute it. Can be faster than doing it via multiple MCP calls.

skatanski11mo ago

Really cool article. Personally I think the really cool bit about MCP is that you can very easily write your own server which can talk to the db or call various APIs. That server can run locally and be used by GitHub Copilot for answering questions and executing tasks. I also find it useful in a tight corporate environment where it’s more difficult to get a dedicated LLM API key. You can easily do POCa with what every dev has access to.

mohsen111mo ago

I've had success using BrowserMCP

https://browsermcp.io

It really feels magical when the AI agent can browse and click around to understand the problem at hand

Also, sometimes an interactive command can stop agents from doing things. I wrote a small wrapper to always return so agents never stop from working

https://github.com/mohsen1/agentshell

greatgib11mo ago

All of that to generate crappy code...

j / k navigate · click thread line to collapse