undefined | Better HN

0 pointsgeor9e10mo ago0 comments

To run a model locally, they would need to release the weights to the public and their competitors. Those are flagship models.

They would also need to shrink them way down to even fit. And even then, generating tokens on an apple neural chip would be waaaaaay slower than an HTTP request to a monster GPU in the sky. Local llms in my experience are either painfully dumb or painfully slow.

0 comments

1 comments · 1 top-level

hu310mo ago

Hence the "come on".

1 more reply

j / k navigate · click thread line to collapse