comparing it to lmsys chatbot arena, what sort of an option would you expect? the prompts essentially come from public HF datasets like parti prompts where they test a bunch of stuff (prompt adherence, attention mapping [something in front of something else etc], aesthetics, photo-realism, etc.) so it is hard to ask about each category.
The question is ok but I need to have a clear input for me to decide which one is better. For example: A serene forest night, a lamp-lit path leads to a cozy wooden house. It comes up with a very detailed almost photorealistic image of the scene, while also bringing up a very well painted one. What do I choose? The input didn't mention anything about the style so it's very hard for me to pick a winner unless (like I said) I'm incredibly subjective.
i see about that case, and yeah you are right. we probably need realistic/artistic tags as you mentioned. thanks for the example! we'll probably include something like that in the next release and group models by ELO on different categories (can be considered like language analogue)