philipkiely on Hacker News

1

We built the fastest API for GLM-5.2 (280 TPS) (opens in new tab)

(baseten.co)

6philipkiely5d ago0

2

The Math Behind TurboQuant (opens in new tab)

(baseten.co)

8philipkiely3mo ago3

3

Show HN: Inference Engineering (opens in new tab)

(baseten.com)

2philipkiely4mo ago0

4

How We Built the Fastest Kimi K2.5 on Artificial Analysis (opens in new tab)

(baseten.co)

3philipkiely4mo ago0

5

Nvidia Invests $150M in AI Inference Startup Baseten (opens in new tab)

(wsj.com)

1philipkiely5mo ago1

6

Baseten raises $150M Series D at $2.15B (opens in new tab)

(fortune.com)

2philipkiely9mo ago1

7

Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs (opens in new tab)

(baseten.co)

247philipkiely10mo ago175

8

How to build function calling and JSON mode for open-source and fine-tuned LLMs (opens in new tab)

(baseten.co)

1philipkiely1y ago0

9

How to double tokens per second for Llama 3 with Medusa (opens in new tab)

(baseten.co)

2philipkiely1y ago0

10

FP8: Efficient model inference with 8-bit floating point numbers (opens in new tab)

(baseten.co)

2philipkiely2y ago0

11

Three techniques to adapt LLMs for any use case (opens in new tab)

(baseten.co)

1philipkiely3y ago0

12

Serving four million Riffusion requests in two days (opens in new tab)

(baseten.co)

5philipkiely3y ago0

13

Show HN: Free Stable Diffusion 2.0 hosted interface (opens in new tab)

(app.baseten.co)

25philipkiely3y ago2

14

Try it yourself: Speech to text with Whisper (opens in new tab)

(app.baseten.co)

5philipkiely3y ago0

15

Deploying Stable Diffusion in Production Using Truss (opens in new tab)

(baseten.co)

3philipkiely3y ago0

philipkiely

Recent submissions

We built the fastest API for GLM-5.2 (280 TPS) (opens in new tab)

The Math Behind TurboQuant (opens in new tab)

Show HN: Inference Engineering (opens in new tab)

How We Built the Fastest Kimi K2.5 on Artificial Analysis (opens in new tab)

Nvidia Invests $150M in AI Inference Startup Baseten (opens in new tab)

Baseten raises $150M Series D at $2.15B (opens in new tab)

Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs (opens in new tab)

How to build function calling and JSON mode for open-source and fine-tuned LLMs (opens in new tab)

How to double tokens per second for Llama 3 with Medusa (opens in new tab)

FP8: Efficient model inference with 8-bit floating point numbers (opens in new tab)

Three techniques to adapt LLMs for any use case (opens in new tab)

Serving four million Riffusion requests in two days (opens in new tab)

Show HN: Free Stable Diffusion 2.0 hosted interface (opens in new tab)

Try it yourself: Speech to text with Whisper (opens in new tab)

Deploying Stable Diffusion in Production Using Truss (opens in new tab)

Recent submissions

We built the fastest API for GLM-5.2 (280 TPS) (opens in new tab)

The Math Behind TurboQuant (opens in new tab)

Show HN: Inference Engineering (opens in new tab)

How We Built the Fastest Kimi K2.5 on Artificial Analysis (opens in new tab)

Nvidia Invests $150M in AI Inference Startup Baseten (opens in new tab)

Baseten raises $150M Series D at $2.15B (opens in new tab)

Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs (opens in new tab)

How to build function calling and JSON mode for open-source and fine-tuned LLMs (opens in new tab)

How to double tokens per second for Llama 3 with Medusa (opens in new tab)

FP8: Efficient model inference with 8-bit floating point numbers (opens in new tab)

Three techniques to adapt LLMs for any use case (opens in new tab)

Serving four million Riffusion requests in two days (opens in new tab)

Show HN: Free Stable Diffusion 2.0 hosted interface (opens in new tab)

Try it yourself: Speech to text with Whisper (opens in new tab)

Deploying Stable Diffusion in Production Using Truss (opens in new tab)