And the twit is gone after public outroar.
Now there claim that 70B saw worse performance than Llama 3.1 70B (and obviously worse than closed source alternatives)[1].
Outstanding questions:
- What exactly did they "partially replicate"
- Why Redditors were able to identify all the details (wrapped Claude, wrapped GPT4o, initial prompt, details of finetuned Lllama 3.0, not 3.1) and ArtificialAnlys was not?
- Why after revealing the truth they still write "We are not clear", "We are not clear"?
[1] https://x.com/ArtificialAnlys/status/1832965630472995220