undefined | Better HN

story

0 pointsicelancer2y ago0 comments

> It's easy and cheap to just try both these days, don't take my word for which one is better.

I literally use 8x-7b on my on-prem GPU cluster and have several fine tunes of 7b (which I said in the previous post). I've used mistral-medium.

GPT-4-turbo is better than them all on all benchmarks, human preference, and anything that isn't biased vibes. My opinion - such that it is - is that GPT-4-turbo is by far the best.

I have no vested interest in it being the best. I'd actually prefer if it wasn't. But all objective data points to it being the best and most lived experiences that are unbiased agree (assuming broad model use and not hyperfocused fine-tunes; I have Mistral-7b fine-tunes beating 4-turbo in very limited domains, but that hardly counts).

The rest of your post I really have no idea what's going on, so good luck with all that I guess.

0 comments

MacsHeadroom2y ago

Mistral Medium beats 4.5 on the censorship benchmark. It doesn't refuse to help with anything that could be vaguely non-PC or could potentially be used to hurt anyone in the wrong hands, including dangerously hot salsa recipes.

wokwokwok2y ago

That's not a metric.

That's a use case.

Certainly, no one here is arguing that there are things openai refuses to allow, and given that the effectiveness of using GPT4 on them is literally zero, a sweet potato connected to a spring and keyboard will "beat" GPT-4, if that's your scoring metric.

If you want a meaningful comparison you need tasks that both tools are capable of doing, and then see how effective they are.

Claiming that mistral medium beats it is like me claiming the RenderMan beats DALLE2 at rendering 3d models; yes, technically they both generate images, but since it's not possible to use DALLE2 to render a 3d model, it's not really a meaningful comparison is it?

theshackleford2y ago

> If you want a meaningful comparison you need tasks that both tools are capable of doing, and then see how effective they are.

The fact it’s incapable of simple requests that an alternative can is absolutely part of a worthwhile comparison.

wokwokwok2y ago

You’re just twisting what “best” means to suit your bias.

That is not a measure of how sophisticated and capable a model is.

GPT4 is a more sophisticated, more capable mode than mistral.

If that doesn’t make it the “better” for you, that’s fine; but any attempt to argue about the capabilities of the models is misguided.

Restrictions placed on a model are an orthogonal concern to its capabilities.

…but sure, you can invent some benchmarks to score models on other criteria, which is entirely valid.

It’s perfectly fair to say that GPT4 doesn’t top all possible metrics… only the meaningful ones about model capabilities.

bambax2y ago

Semantics.

Both tools are generative systems that produce text in response to a prompt. If Mistral was mute on random topics for no other reason that its makers dislike talking about that, would you say it doesn't count?

benreesman2y ago

I'm a big proponent of freedom in this space (and remain one), but Dolphin is fucking scary.

I don't have any use cases for crime in my life at the moment beyond wanting to pirate like Adobe Illustrator before signing up for an uncancelable subscription, but it will do arbitrary things within it's abilities and it's google with a grudge in terms of how to do anything you ask. I stopped wanting to know when it convinced me it could explain how to stage a coup d'etat. I'm back on mixtral-8x7b.

dbuxton2y ago

Agree with this. I would say that the rate of progress from Mistral is very encouraging though in terms of having multiple plausible contenders for the crown.

j / k navigate · click thread line to collapse

0 comments

MacsHeadroom2y ago

wokwokwok2y ago

That's not a metric.

That's a use case.

If you want a meaningful comparison you need tasks that both tools are capable of doing, and then see how effective they are.

theshackleford2y ago

> If you want a meaningful comparison you need tasks that both tools are capable of doing, and then see how effective they are.

The fact it’s incapable of simple requests that an alternative can is absolutely part of a worthwhile comparison.

wokwokwok2y ago

You’re just twisting what “best” means to suit your bias.

That is not a measure of how sophisticated and capable a model is.

GPT4 is a more sophisticated, more capable mode than mistral.

If that doesn’t make it the “better” for you, that’s fine; but any attempt to argue about the capabilities of the models is misguided.

Restrictions placed on a model are an orthogonal concern to its capabilities.

…but sure, you can invent some benchmarks to score models on other criteria, which is entirely valid.

It’s perfectly fair to say that GPT4 doesn’t top all possible metrics… only the meaningful ones about model capabilities.

bambax2y ago

Semantics.

benreesman2y ago

I'm a big proponent of freedom in this space (and remain one), but Dolphin is fucking scary.

dbuxton2y ago

Agree with this. I would say that the rate of progress from Mistral is very encouraging though in terms of having multiple plausible contenders for the crown.

j / k navigate · click thread line to collapse