undefined | Better HN

0 pointshackernewds1y ago0 comments

Who is "we"?

0 comments

5 comments · 1 top-level

parthsareen1y ago· 4 in thread

I authored the blog with some other contributors and worked on the feature (PR: https://github.com/ollama/ollama/pull/7900).

The current implementation uses llama.cpp GBNF grammars. The more recent research (Outlines, XGrammar) points to potentially speeding up the sampling process through FSTs and GPU parallelism.

mmoskal1y ago

If you want avoid startup cost, llguidance [0] has no compilation phase and by far the fullest JSON support [1] of any library. I did a PoC llama.cpp integration [2] though our focus is mostly server-side [3].

[0] https://github.com/guidance-ai/llguidance [1] https://github.com/guidance-ai/llguidance/blob/main/parser/s... [2] https://github.com/ggerganov/llama.cpp/pull/10224 [3] https://github.com/guidance-ai/llgtrt

HanClinto1y ago

I have been thinking about your PR regularly, and pondering about how we should go about getting this merged in.

I really want to see support for additional grammar engines merged into llama.cpp, and I'm a big fan of the work you did on this.

parthsareen1y ago

This looks really useful. Thank you!

netghost1y ago

Thank you for the details!

j / k navigate · click thread line to collapse