undefined | Better HN

0 pointsOur_Benefactors9mo ago0 comments

The data showed llms are better. This put debate to rest. Now we are post-debate.

0 comments

9 comments · 3 top-level

lmf4lol9mo ago· 4 in thread

give me one seriously peer reviewed study please with proper controls

i wait

Go ahead and move the goalposts now... This took about 2 minutes of research to support the conclusions I know to be true. You can waste time as long as you choose in academia attempting to prove any point, while normal people make real contributions using LLMs.

### An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation We evaluate TESTPILOT using OpenAI’s gpt3.5-turbo LLM on 25 npm packages with a total of 1,684 API functions. The generated tests achieve a median statement coverage of 70.2% and branch coverage of 52.8%. In contrast, the state-of-the feedback-directed JavaScript test generation technique, Nessie, achieves only 51.3% statement coverage and 25.6% branch coverage. - *Link:* [An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation (arXiv)](https://arxiv.org/abs/2302.06527)

---

### Field Experiment – CodeFuse (12-week deployment) - Productivity (measured by the number of lines of code produced) increased by 55% for the group using the LLM. Approximately one third of this increase was directly attributable to code generated by the LLM. - *Link:* [CodeFuse: Generative AI for Code Productivity in the Workplace (BIS Working Paper 1208)](https://www.bis.org/publ/work1208.htm)

footy9mo ago

> This took about 2 minutes of research to support the conclusions I know to be true

This is a terrible way to do research!

1 more reply

capyba9mo ago

“ Productivity (measured by the number of lines of code produced) increased”

The LLM’s better have written more code, they’re a text generation machine!

In what world does this study prove that the LLM actually accomplished anything useful?

1 more reply

psunavy039mo ago

If you are seriously linking "productivity" to "lines of code produced," that says all about your credibility that I need to know.

1 more reply

JohnFen9mo ago· 2 in thread

What data are you talking about? Why do you value it above the data showing the opposite?

snickerbockers9mo ago

It's superior data because it supports his expectations. His expectations are right because they are based on superior data. Checkmate Luddites.

Our_BenefactorsOP9mo ago

Meanwhile, you have furnished zero data that supports your claims. Ho hum.

1 more reply

snickerbockers9mo ago

"the data"

j / k navigate · click thread line to collapse

0 comments

9 comments · 3 top-level

lmf4lol9mo ago· 4 in thread

give me one seriously peer reviewed study please with proper controls

i wait

Our_BenefactorsOP9mo ago

---

footy9mo ago

> This took about 2 minutes of research to support the conclusions I know to be true

This is a terrible way to do research!

1 more reply

capyba9mo ago

“ Productivity (measured by the number of lines of code produced) increased”

The LLM’s better have written more code, they’re a text generation machine!

In what world does this study prove that the LLM actually accomplished anything useful?

1 more reply

psunavy039mo ago

If you are seriously linking "productivity" to "lines of code produced," that says all about your credibility that I need to know.

1 more reply

JohnFen9mo ago· 2 in thread

What data are you talking about? Why do you value it above the data showing the opposite?

snickerbockers9mo ago

It's superior data because it supports his expectations. His expectations are right because they are based on superior data. Checkmate Luddites.

Our_BenefactorsOP9mo ago

Meanwhile, you have furnished zero data that supports your claims. Ho hum.

1 more reply

snickerbockers9mo ago

"the data"

j / k navigate · click thread line to collapse