So cool seeing these come to life.
I forked anthropic's MCP at the time to use it in the browser, but it was just too much trouble and I wanted to wait for something like WebMCP to appear before fiddling with it more.
Planning on dusting off the DAW and releasing it very soon.
With that, definitely looking forward to models producing good music the analytical way and not pattern finding as we see in specialized audio/music-gen models.
In the case of my DAW, I went even fundamental and created a node-based visual UI and gave the agent the ability to program new modules using the Web Audio API, and to choose from selection of stock instruments and effects. Modules are editable after instantiation, and automatically create UI for each module based on the parameters, input and output. It could spawn and wire things up, do sound design, that sort of thing.
I also have recently tried out Gemini 3.1 Pro out on audio, and you should give it a spin if you haven't yet. It actually is the first model I've seen really able to talk about music in terms of frequency and time with great accuracy. It can break down songs by instrumentation, composition, sound design, arrangement, etc.
Its philosophical take on the music itself isn't always great, but it does have precision and at a high level you can see where things are headed. Some of its advice was definitely valid and actionable. I want to plug it into my DAW or Ableton MCP and see what happens. It might actually be able to do real sound design. What I want to do is not just ask for a melody, but be able to say things like, "let's throw a Reese base in there" or "sidechain everything under the kick" and for the model to know what I'm talking about. So not just music theory, etc. but sound design as well.
I'd love to chat about this more somewhere and cross-pollinate ideas if you're up for it, email's in my bio.
I find this approach to be more appealing than AI models that generate fully baked songs as waveforms. Give me something I can open in Logic and keep tweaking…