undefined | Better HN

0 pointsrazorguymania2y ago0 comments

Its using vanilla llama-2 from Meta with no fine tuning. The point here is the speed and responsiveness of the underlying HW and SW.

0 comments

chihuahua2y ago

But if the quality of the response is poor, it's irrelevant that it was generated quickly. If it was using different data to generate higher quality responses, would that not slow it down?

tome2y ago

nomel gave a good answer in a different thread

> This is not about the model, it’s about the relative speed improvement from the hardware, with this model as a demo.

To compare apples to apples look at the tokens per second of other systems running Llama 2 70B 4096. We're by far the fastest!

https://news.ycombinator.com/item?id=38742466

andygeorge2y ago

Do you work there? Just curious

razorguymaniaOP2y ago

yes

andygeorge2y ago

Thanks!

andygeorge2y ago

ah Llama 2 70B, no wonder

j / k navigate · click thread line to collapse

0 comments

chihuahua2y ago

But if the quality of the response is poor, it's irrelevant that it was generated quickly. If it was using different data to generate higher quality responses, would that not slow it down?

tome2y ago

nomel gave a good answer in a different thread

> This is not about the model, it’s about the relative speed improvement from the hardware, with this model as a demo.

To compare apples to apples look at the tokens per second of other systems running Llama 2 70B 4096. We're by far the fastest!

https://news.ycombinator.com/item?id=38742466

andygeorge2y ago

Do you work there? Just curious

razorguymaniaOP2y ago

yes

andygeorge2y ago

Thanks!

andygeorge2y ago

ah Llama 2 70B, no wonder

j / k navigate · click thread line to collapse