undefined | Better HN

0 pointsmityamitya2y ago0 comments

Hi! We ran LSH filtering over datasets to remove all code that can be similar to HumanEval samples.

0 comments

2 comments · 1 top-level

riku_iki2y ago· 1 in thread

so, we have to trust your procedure..

It can be checked if the model predicts canonical solutions from humaneval. I understand it is not ideal, but at least you can check it yourself

There are a bunch of other benchmarks too, check out the page https://huggingface.co/smallcloudai/Refact-1_6B-fim

Also, feel free to run any new benchmarks

j / k navigate · click thread line to collapse