1Cloning Bench: Evaluating AI Agents on Visual Website Cloning (opens in new tab)(github.com)GitHub2shahules2mo ago1Save
2PA bench: Evaluating web agents on real world personal assistant workflows (opens in new tab)(vibrantlabs.com)38shahules4mo ago9Save
3PA Bench: Evaluating Frontier Models on Multi-Tab Pa Tasks (opens in new tab)(vibrantlabs.com)7shahules4mo ago1Save
4Show HN: Ragas – Open-source library for evaluating RAG pipelines (opens in new tab)(github.com)GitHub121shahules2y ago26Save
5Show HN: Ragas – Open-source library for evals and testing RAG systems (opens in new tab)(github.com)GitHub15shahules2y ago9Save
6Show HN: The rise of open source large language models (opens in new tab)(explodinggradients.com)5shahules3y ago0Save
7Show HN: GPT4 vs. GPT3:What you should know (opens in new tab)(explodinggradients.com)2shahules3y ago0Save
8Show HN: Open-source alternative to Adobe speech enhancer (opens in new tab)(github.com)GitHub3shahules3y ago0Save