typpo on Hacker News

1

OpenAI frontier models and Codex are now available on AWS (opens in new tab)

(openai.com)

370typpo24d ago131

2

How to replicate the Claude Code attack with Promptfoo (opens in new tab)

(promptfoo.dev)

6typpo7mo ago0

3

Questions censored by DeepSeek (opens in new tab)

(promptfoo.dev)

384typpo1y ago227

4

Llama 3.2 (opens in new tab)

(huggingface.co)

21typpo1y ago0

5

Automated jailbreaking techniques with DALL-E (opens in new tab)

(promptfoo.dev)

2typpo1y ago0

6

Show HN: Automated red teaming for your LLM app (opens in new tab)

(promptfoo.dev)

23typpo2y ago2

7

Benchmark Command R vs. GPT/Claude on your own data (opens in new tab)

(promptfoo.dev)

2typpo2y ago0

8

DBRX vs. Mixtral vs. GPT: create your own benchmark (opens in new tab)

(promptfoo.dev)

1typpo2y ago0

9

How to benchmark Gemini vs. GPT with your own data (opens in new tab)

(promptfoo.dev)

1typpo2y ago0

10

A collection of LLM evaluation tools (opens in new tab)

(ianww.com)

2typpo2y ago1

11

How to benchmark Llama2 Uncensored vs. GPT-3.5 on your own inputs (opens in new tab)

(promptfoo.dev)

16typpo2y ago0

12

Benchmark Llama 2 vs. GPT on your own data (opens in new tab)

(promptfoo.dev)

1typpo2y ago0

13

Show HN: CLI for testing and evaluating LLM prompts and outputs (opens in new tab)

(github.com)GitHub

2typpo2y ago0

14

An open-source framework for prompt engineering (opens in new tab)

(ianww.com)

3typpo3y ago0

15

Show HN: Promptfoo – CLI for testing & improving LLM prompt quality (opens in new tab)

(github.com)GitHub

14typpo3y ago5

typpo

Recent submissions

OpenAI frontier models and Codex are now available on AWS (opens in new tab)

How to replicate the Claude Code attack with Promptfoo (opens in new tab)

Questions censored by DeepSeek (opens in new tab)

Llama 3.2 (opens in new tab)

Automated jailbreaking techniques with DALL-E (opens in new tab)

Show HN: Automated red teaming for your LLM app (opens in new tab)

Benchmark Command R vs. GPT/Claude on your own data (opens in new tab)

DBRX vs. Mixtral vs. GPT: create your own benchmark (opens in new tab)

How to benchmark Gemini vs. GPT with your own data (opens in new tab)

A collection of LLM evaluation tools (opens in new tab)

How to benchmark Llama2 Uncensored vs. GPT-3.5 on your own inputs (opens in new tab)

Benchmark Llama 2 vs. GPT on your own data (opens in new tab)

Show HN: CLI for testing and evaluating LLM prompts and outputs (opens in new tab)

An open-source framework for prompt engineering (opens in new tab)

Show HN: Promptfoo – CLI for testing & improving LLM prompt quality (opens in new tab)

Recent submissions

OpenAI frontier models and Codex are now available on AWS (opens in new tab)

How to replicate the Claude Code attack with Promptfoo (opens in new tab)

Questions censored by DeepSeek (opens in new tab)

Llama 3.2 (opens in new tab)

Automated jailbreaking techniques with DALL-E (opens in new tab)

Show HN: Automated red teaming for your LLM app (opens in new tab)

Benchmark Command R vs. GPT/Claude on your own data (opens in new tab)

DBRX vs. Mixtral vs. GPT: create your own benchmark (opens in new tab)

How to benchmark Gemini vs. GPT with your own data (opens in new tab)

A collection of LLM evaluation tools (opens in new tab)

How to benchmark Llama2 Uncensored vs. GPT-3.5 on your own inputs (opens in new tab)

Benchmark Llama 2 vs. GPT on your own data (opens in new tab)

Show HN: CLI for testing and evaluating LLM prompts and outputs (opens in new tab)

An open-source framework for prompt engineering (opens in new tab)

Show HN: Promptfoo – CLI for testing & improving LLM prompt quality (opens in new tab)