leonardtang on Hacker News

1

EvoForge: Evolutionary Harness Optimization (opens in new tab)

(twitter.com)

2leonardtang2mo ago0

2

Chinese Calligraphy Is a Frontier Task (opens in new tab)

(twitter.com)

1leonardtang2mo ago0

3

TournO: Tournament Optimization for Non-Verifiable RL (opens in new tab)

(github.com)GitHub

3leonardtang3mo ago0

4

j1-micro and j1-nano: Tiny (0.6B, 1.7B) and Mighty Reward Models (opens in new tab)

(github.com)GitHub

3leonardtang1y ago0

5

Verdict: A Library for Scaling Judge-Time Compute (opens in new tab)

(twitter.com)

3leonardtang1y ago0

6

Awesome-LLM-Judges (opens in new tab)

(github.com)GitHub

2leonardtang1y ago0

7

LLM Judges (opens in new tab)

(github.com)GitHub

2leonardtang1y ago0

8

Cascade: A fast, automated, multi-turn LLM jailbreaking method (opens in new tab)

(twitter.com)

2leonardtang1y ago0

9

RBAC RAG (opens in new tab)

(github.com)GitHub

1leonardtang1y ago0

10

RBAC RAG with MongoDB (opens in new tab)

(github.com)GitHub

2leonardtang1y ago0

11

Simple and Safe RAG with RBAC (opens in new tab)

(github.com)GitHub

2leonardtang1y ago0

12

Inducing LLM Hallucinations (opens in new tab)

(github.com)GitHub

2leonardtang1y ago0

13

Sphynx: Fuzz Testing Hallucination Detection Models (opens in new tab)

(github.com)GitHub

2leonardtang1y ago0

14

It's a bad day to be a language model (opens in new tab)

(github.com)GitHub

2leonardtang2y ago1

15

Thorn in a HaizeStack test for evaluating long-context adversarial robustness (opens in new tab)

(github.com)GitHub

19leonardtang2y ago11

leonardtang

Recent submissions

EvoForge: Evolutionary Harness Optimization (opens in new tab)

Chinese Calligraphy Is a Frontier Task (opens in new tab)

TournO: Tournament Optimization for Non-Verifiable RL (opens in new tab)

j1-micro and j1-nano: Tiny (0.6B, 1.7B) and Mighty Reward Models (opens in new tab)

Verdict: A Library for Scaling Judge-Time Compute (opens in new tab)

Awesome-LLM-Judges (opens in new tab)

LLM Judges (opens in new tab)

Cascade: A fast, automated, multi-turn LLM jailbreaking method (opens in new tab)

RBAC RAG (opens in new tab)

RBAC RAG with MongoDB (opens in new tab)

Simple and Safe RAG with RBAC (opens in new tab)

Inducing LLM Hallucinations (opens in new tab)

Sphynx: Fuzz Testing Hallucination Detection Models (opens in new tab)

It's a bad day to be a language model (opens in new tab)

Thorn in a HaizeStack test for evaluating long-context adversarial robustness (opens in new tab)

Recent submissions

EvoForge: Evolutionary Harness Optimization (opens in new tab)

Chinese Calligraphy Is a Frontier Task (opens in new tab)

TournO: Tournament Optimization for Non-Verifiable RL (opens in new tab)

j1-micro and j1-nano: Tiny (0.6B, 1.7B) and Mighty Reward Models (opens in new tab)

Verdict: A Library for Scaling Judge-Time Compute (opens in new tab)

Awesome-LLM-Judges (opens in new tab)

LLM Judges (opens in new tab)

Cascade: A fast, automated, multi-turn LLM jailbreaking method (opens in new tab)

RBAC RAG (opens in new tab)

RBAC RAG with MongoDB (opens in new tab)

Simple and Safe RAG with RBAC (opens in new tab)

Inducing LLM Hallucinations (opens in new tab)

Sphynx: Fuzz Testing Hallucination Detection Models (opens in new tab)

It's a bad day to be a language model (opens in new tab)

Thorn in a HaizeStack test for evaluating long-context adversarial robustness (opens in new tab)