undefined | Better HN

0 pointsdaxfohl6mo ago0 comments

Have you heard of any attempts to bake MCP definitions into LoRA adapters? I've been wondering if that's a viable approach, so you don't have to put them all in context, and toggling them on and off would just be a matter of applying or unapplying the weights. That seems like it'd be more robust than putting "enable FooMCP" "disable FooMCP" etc in the context, which I'd think would trip up the LLM eventually. And it would avoid full rebuild of the KV cache that'd be required if you fully removed FooMCP from the context prefix.

Depending on use case you could either insert the LoRA weights as their own layers at runtime (no time to create, but extra layer to compute each token), merge them with existing layers (initial delay to merge layers, but no runtime penalty after), or have pre-merged models for common cases (no perf penalty but have to reserve more storage).

0 comments

simonw6mo ago

I've not heard of anyone trying that, but I don't think I've been looking in the right kinds of places.

My current mental model of LoRA is that this would be unlikely to Work, but I've never used them so I don't really know what I'm talking about. Would be a very interesting experiment!

j / k navigate · click thread line to collapse

0 comments

simonw6mo ago

I've not heard of anyone trying that, but I don't think I've been looking in the right kinds of places.

My current mental model of LoRA is that this would be unlikely to Work, but I've never used them so I don't really know what I'm talking about. Would be a very interesting experiment!

j / k navigate · click thread line to collapse