The real opportunity with Agent Skills isn't just packaging prompts. It's providing a mechanism that enables a clean split: LLM as the control plane (planning, choosing tools, handling ambiguous steps) and code or sub-agents as the data/execution plane (fetching, parsing, transforming, simulating, or executing NL steps in a separate context).
This requires well-defined input/output contracts and a composition model. I opened a discussion on whether Agent Skills should support this kind of composability:
And also, in writing, writing from top to bottom has its disadvantages. It makes sense to emulate human writing process and have passes, as you flesh out, and conversely summarize writing.
Current LLMs can brute force these things through emulation/observation/mimicry but they arent as good as doing it the right way. Not only would I like to see "skills" but also "processes" where you create a well defined order that tasks are accomplished in sequence. Repeatable templates. This would essentially include variables in the templates, set for replacement.
You can do this with Gemini commands and extensions.
https://cloud.google.com/blog/topics/developers-practitioner...
Of course this requires substantial buy in from application owners - create the vocabulary - and users - agree to expose and share the sentences they generate - but the results would be worth it.
Additionally, I can't even get claude or codex to reliable use the prompt and simple rules (use this command to compile) in an agents.md or whatever required markdown file is needed. Why would I assume they will reliably handle skills prompts spread about a codebase?
I've even seen tool usage deteriorate while it's thinking and self commanding through its output to say.. read code from a file. Sometimes it uses tail while other times it gets confused on the output and then writes a basic python program to parse lines and strings from the same file to effectively get what was the same output as before. How bizarre!
if AI were deterministic, what difference would different AI model make?
IIUC their most recent arc focuses on prompt optimization[0] where you can optimize — using DSPy and an optimization algo GEPA [1] — using relative weights on different things like errors, token usage, complexity.
[0] https://docs.boundaryml.com/guide/baml-advanced/prompt-optim... [1] https://github.com/gepa-ai/gepa?tab=readme-ov-file
> Parsing a known HTML structure
In most cases, HTML structures that are being parsed aren't known. If they're known, you control them, and you don't need to parse them in the first place. If they're someone else's, who knows when they'll change, or under what condition they're different.
But really, I don't see the stuff you're talking about happening in prod for non-one-off usecases. I see LLMs used in prod usecases exactly for data where you don't know exactly what its shape will be, and there's an enormous amount of such cases. If the same logic is needed every time, of course you don't have an LLM execute that logic, you have the LLM write a deterministic script.
Skills are essentially boiling down to distributed parts of a Main Prompt. If you consider a state model you can see this pattern: Task is the state and combining the task's specifics skills defines the current prompt augmentation. When the task changes, another prompt emerges.
In the end, it is the clear guidance of the Agent that is the deciding factor.
Transforming an arbitrary table is still hard, especially a table on a webpage or in a document. Sometimes I even struggle to find the right library. The effort does not seem worth it for one-off need of such transformation too. LLM can be a great tool for doing the tasks.
MCP does three things conceptually: it lets you build a bridge between an agent and <something else>, it specifies a UI+API layer between the bridge and the LLM, and it formalizes the description of that bridge in a tool-calling format.
It's that UI+API layer that's the biggest pain in the ass, in my opinion. Sometimes you need it; for instance, if you wanted an agent to access your emails, a high quality MCP server that can't destroy your life through enthusiastic tool calling makes sense.
If, however, you have, say a CLI tool or simple API that's reasonably self documenting and you're willing to have it run, and/or if you need specific behavior with a different context setting, then a skill can just be a markdown file that explains what, how, why.
All public MCP server I’ve seen have been a disaster with too many tools and tokens polluting the context. It’s really most useful when you need tight integration with some other environment and can write a little custom wrapper to provide it.
I will say, when using MCP be selective about which tools you enable. A lot of the time they come with say 30 tools and you only personally care about 5 of them. The other 25 are just rotting your context.
The durable pattern here isn't a specific file format. It's on-demand capability discovery: a small index with concise metadata so the model can find what's available, then pull details only when needed. That's a real improvement over tool calling and MCP's "preload all tools up front" approach, and it mirrors how humans work. Even as models bake more know-how into their weights, novel capabilities will always be created faster than retraining cycles. And even if context becomes unlimited, preloading everything up front remains wasteful when most of it is irrelevant to the task at hand.
So even if "Skills" gets replaced, discoverability and progressive disclosure likely survive.
The problem isn’t having a standard way for agents to branch out. The problem is that AI is the new Javascript web framework: there’s nothing wrong with frameworks, but when everyone and their son are writing a new framework and half those frameworks barely work, you end up with a buggy, fragmented ecosystem.
I get why this happens. Startups want VC money, established companies then want to appear relevant, and then software engineers and students feel pressured to prove they’re hireable. And you end up with one giant pissing contest where half the players likely see the ridiculousness of the situation but have little choice other than to join party.
We'll see how many of these are around in a few years.
The agent loop architectural pattern (and that’s the relevant bit) is going to continue to matter. There will be new patterns for sure, but tool calling plus while loop (which is all an “agent” is) is powerful and highly general.
Right now models have roughly all of the written knowledge available to mankind, minus some obscure held out private archives and so on. They have excellent skills and general abilities to construct plausible sequences of actions to accomplish work, but we need to hold their hands to really get decent performance across a wide range of activities. Skills and agent frameworks and MCP carve out different domains of that problem, with successful solutions providing training data for future models that might be able to be either generalized, or they'll be able to create a vast mountain of synthetic data following successful patterns, and make the next generation of models incredibly useful for a huge number of tasks, by default.
It might also be possible that by studying the problem, identifying where mode collapses and issues with training prevent the right sort of generalization, they might tweak the architecture and be able to solve the deficiency through normal training runs, and thereby discard the need for all the bespoke artisanal agent specifications.
So basically a reusable prompt like the previous has asked?
This may all be very wrong, though, as it's mostly conjecture from the little I've worked with skills.
This lets you trigger a skill with '/foo' in a way that resembles the way you'd use the command line.
Claude Code is very good at using well-defined skills without a command though, but in a scenario where this is some nuance between similar skills they are useful.
BUT what makes them powerful is that you can include code with the skill package.
Like I have a skill that uses a Go program to traverse the AST of a Go project to find different issues in it.
You COULD just prompt it but then the LLM would have to dig around using find and grep. Now it runs a single executable which outputs an LLM optimised clump of text for processing.
Inversely, you can persist/summarize a larger bit of context into a skill, so a new agent session can easily pull it in.
So yes, it's just turtles, sorry, prompts all the way down.
https://github.com/alganet/skills/blob/main/skills/left-padd...
Either way, that’s hilarious. Well done.
<conspiracy_mode> maybe all of them were designed to occupy the full context window of earlier GPT models </conspiracy_mode>
Apart from Google Inc., I have not seen a single "AI company" propose an RFC that was reviewed by the IETF and became a proper internet standard. [0]
"MCP" was one of the worst so-called "standards" ever built since the JWT was proposed. So I do not take Anthropic seriously when they create so-called "open standards" especially when the reference implementation is in Javascript or TypeScript.
> I have not seen a single "AI company" propose an RFC that was reviewed by the IETF and became a proper internet standard.
Why would the IETF have anything to do with LLM/agent standards? This seems like a category error. They also don’t ratify web standards, for example.
like deno vs npm package ecosystems that didn't work together for many years
There are multiple AGENTS vs CLAUDE vs .github/instructions; skills vs commands; ... intermixed and inconsistent concepts, all out in the wild
When I work on a project, do all the files align? If I work in an org, where developers have agent choice, how many of these instructions and skills "distros" do I need to put (pollute?) my repo with?
It is not healthy when you have an obsession this bad, seriously. Seek help.
Although Skills are just md files but it’s good to see them “donate” it.
There goal seems to be simple: Focus on coding and improving it. They’ve found a great niche and hopefully revenue generating business there.
OpenAI on the other hand doesn’t give me same vibes, they don’t seem very oriented. They’re playing catchup with both Google models and Anthropic
Apple has shortcuts, but they haven’t propped it up like a standard that other people can use.
To contrast this is something you can use even if you have nothing to do with Claude, and your tools created will be compatible with the wider ecosystem.
Many many MCPs could and should just be a skill instead.
Paper & applications published here: https://earthpilot.ai/metaskills/
``` web-artifacts-builder
Suite of tools for creating elaborate, multi-component claude.ai HTML artifacts using modern frontend web technologies (React, Tailwind CSS, shadcn/ui). Use for complex artifacts requiring state management, routing, or shadcn/ui components - not for simple single-file HTML/JSX artifacts. ```
Say I want to build a landing page with some relatively static content — I don't know it yet but its just gonna be bootstrap CSS, no SPA/React(ish), it'll be fine with templated server side thing. But I don't know how to express this in words. Could the skill _evolve_ based on what my preferences are and what is possible for a relative novice to grok and construct?
This is a simple example, but it could extend to say using sqlite+litestream instead of postgres or using Gradient boosted trees instead of an expensive transformer based classifier.
---
persona: hacker
description: logical, talks about computers a lot, enjoys coffee, somewhat snarky and arrogant
---
<more details here>1. For an experienced Claude Code user, you can already build such an agent persona quite trivially by using the /agents settings.
2. It doesn't actually replace agents. Most people I know use pre-defined agents for some tasks, but they still want the ability to create ad-hoc agents for specific needs. Your standard, by requiring them to write markdown files does not solve this ad-hoc issue.
3. It does not seem very "viral" or income-generating. I know this is premature at this point, but without charging users for the standard, is it reasonable to expect to make money off of this?
"you're absolutely right!"
Please tell us how REALLY feel about JavaScript.
And of course Claude Code has custom slash commands which are also very similar.
Getting a lot of whiplash from all these specifications that are hastily put together and then quickly forgotten.
Other than that it appears MCP prompts end up as slash commands provided by an MCP Server (instead of client side command definitions).
But the actual knowledge that is encoded in skills/commands/mcp prompts is very similar.
But skills dont really solve the problem. Turning that workaround into a standard feels strange. Standardizing a patch isn’t something I’d expect from Anthropic, it’s unclear what is their endgame here
The value of standardizing skills is that the skills you define work with any agentic tool. Doesn't matter how simple they are, if they dont work easily, they have no use.
You need a practical and efficient way to give the llm your context. Just like every organization has its own standards, best practices, architectures that should be documented, as new developers do not know this upfront, LLMs also need your context.
An llm is not an all knowing brain, but it’s a plan-do-check-act text processing machine.
Marketing. That defines pretty much everything Anthropic does beyond frontier model training. They're the same people producing sensationalized research headlines about LLMs trying to blackmail folks in order to prevent being deleted.
This is not the first time, perhaps expectation adjustment is in order. This is also the same company that has an exec telling people in his Discord (15m of fame recently) Claude has emotions
I think that they often do solve the problem, just maybe they have some other side effects/trade offs.
The best one we have thought of so far.
It has been published as an open specification.
Whether it is a standard isn't for them to declare.
Could one make a copyleft type license such that the generated code must be licensed free and open and under the same license? How enforceable are licenses on these skills anyway, if one can take in the whole skill with an agent and generate a legally distinct but functionally close variant?
It does code execution in an apple container if your Skill requires any code execution.
It also proves the point that Skills are basically repackaged MCPs (if you look into my code).
For example, you can't have a directory named "Stripe-Skills" which will give you a breakdown of last week's revenue (unless you write in the skills how to connect to stripe and get that information). So, most of the remote, existing services are better used as MCPs (essentially APIs).
These two solutions look feel and smell like the same thing. Are they the same thing?
Any OpenCode users out there have any hot or nuanced takes?
It is functionally a skill. I suppose once anti-gravity supports skills, I will make it one officially.
I'm authoring equivalent in CUE, and assimilating "standard" provider ones into CUE on the fly so my agent can work with all the shenanigans out there.
npx ai-agent-skills install frontend-design
20 of the most starred Claude skills ever, now open across Claude Code, Cursor, Amp, VS Cod : anywhere that supports the spec. Would love some feedback on it
github.com/skillcreatorai/Ai-Agent-Skills
There's no real benefit to the MCP protocol over a regular API with a published "client" a local LLM can invoke. The only downside is you'd have to pull this client prior.
I am using local "skill" as reference to an executable function, not specifically Claude Skills.
If the LLM/Agent executes tools via code in a sandbox (which is what things are moving towards), all LLM tools can be simply defined as regular functions that have the flexibility to do anything.
I seriously doubt MCP will exist in any form a few years from now
It's a much better system in my experience.