1We built the fastest API for GLM-5.2 (280 TPS) (opens in new tab)(baseten.co)6philipkiely5d ago0Save
4How We Built the Fastest Kimi K2.5 on Artificial Analysis (opens in new tab)(baseten.co)3philipkiely4mo ago0Save
5Nvidia Invests $150M in AI Inference Startup Baseten (opens in new tab)(wsj.com)1philipkiely5mo ago1Save
7Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs (opens in new tab)(baseten.co)247philipkiely10mo ago175Save
8How to build function calling and JSON mode for open-source and fine-tuned LLMs (opens in new tab)(baseten.co)1philipkiely1y ago0Save
9How to double tokens per second for Llama 3 with Medusa (opens in new tab)(baseten.co)2philipkiely1y ago0Save
10FP8: Efficient model inference with 8-bit floating point numbers (opens in new tab)(baseten.co)2philipkiely2y ago0Save
11Three techniques to adapt LLMs for any use case (opens in new tab)(baseten.co)1philipkiely3y ago0Save
12Serving four million Riffusion requests in two days (opens in new tab)(baseten.co)5philipkiely3y ago0Save
13Show HN: Free Stable Diffusion 2.0 hosted interface (opens in new tab)(app.baseten.co)25philipkiely3y ago2Save
14Try it yourself: Speech to text with Whisper (opens in new tab)(app.baseten.co)5philipkiely3y ago0Save
15Deploying Stable Diffusion in Production Using Truss (opens in new tab)(baseten.co)3philipkiely3y ago0Save