First: fully interactive UI. This might sound unnecessary, but synthetic data is a creative and iterative process. It helps to review each step as you go, tweaking prompts. Are the topics right? Are the inputs realistic? Are the outputs reasonable? Once your prompts are dialed in, you can scale up the volume, but there's a creative iterative process to get there.
Second: we have many templates for common synthetic data gen use cases. For fine-tuning you want to focus on the breadth of realistic inputs. For "bug" evals you want to trigger specific error cases based on a description of the issue. For measuring evaluators/LLM judges you need a topic tree mixing passing and failing data. We also provide templates for common use cases: bias, maliciousness, toxicity, jailbreaking, etc. These are good to bootstrap the creative process above, but you can edit each to meet your needs.
It's a free app on GitHub. Docs and videos: https://docs.kiln.tech/docs/synthetic-data-generation
We are actually planning on moving to graphs now, which we are seeing better results with over trees, check it out if you also want to use them in kiln - but you might want to wait until we validate a little more and lift it out of experimental.
I think the key difference between the two since kiln adopted the same approach is the ability to generate reasoning / chain of thought and export to alpaca, chatml, etc - along with direct to unsloth.ai's formatting. I doubt we will have UI as its for running on backend systems and part of an ML pipeline along with being a library / SDK.
I might have taken some of the prompts and modified them. I didn't recognize the new name, do recognize the old one.
Edit:
- just confirmed. No code copied. Prompts were originally from the Pluto library, then modified by the library above, then modified again by me for Kiln.
- And just to clarify, Kiln has had supported for chain of thought, reasoning, and all major export formats (ChatML/Unsloth/OpenAI/Hugging Face). Plus API integrations with Together, Fireworks, OpenAI, Google Vertex.
People should try both. I just want to clear on the origins of the code/prompts, and the feature set.
GSM8K: https://huggingface.co/datasets/lukehinds/deepfabric-GSM8K-c...
also some others
infra failures reasoning / CoT: https://huggingface.co/datasets/lukehinds/deepfabric-devops-...
Medical (multi-turn): https://huggingface.co/datasets/lukehinds/deepfabric-7k-medi...
Programming challenges: https://huggingface.co/datasets/lukehinds/programming-challe...
If there is anything in particular you need, drop me a message or feel free to open an issue and I can create something for you.
you can raise and issue and I will certainly give it a go - or also reach me via the discord link on the main repo. Let's see what we can do.