Text from them:

Calling all model makers, or would-be model creators! Chai asked me to tell you all about their open source LLM leaderboard:

Chai is running a totally open LLM competition. Anyone is free to submit a llama based LLM via our python-package 🐍 It gets deployed to users on our app. We collect the metrics and rank the models! If you place high enough on our leaderboard you’ll win money 🥇

We’ve paid out over $10,000 in prizes so far. 💰

Come to our discord and check it out!

https://discord.gg/chai-llm

Link to latest board for the people who don’t feel like joining a random discord just to see results:

https://cdn.discordapp.com/attachments/1134163974296961195/1138833170838589471/image1.png

  • noneabove1182OPM
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Yeah it’s a step in the right direction at least, though now that you mention it doesn’t lmsys or someone do the same with human eval and side by side comparisons?

    It’s such a tricky line to walk between deterministic questions (repeatable but cheatable) and user questions (real world but potentially unfair)