Chai is running their own open source leaderboard

noneabove1182 · edit-2 2 years ago

Chai is running their own open source leaderboard

noneabove1182 · 2 years ago

Yeah it’s a step in the right direction at least, though now that you mention it doesn’t lmsys or someone do the same with human eval and side by side comparisons?

It’s such a tricky line to walk between deterministic questions (repeatable but cheatable) and user questions (real world but potentially unfair)