LLM Leaderboard Explorer

Interactive visualization of merged leaderboards data

Welcome!

This application provides an interactive view of combined data from two leading LLM evaluation platforms:

  • LiveBench: ​LiveBench is a dynamic benchmark, featuring monthly updated, contamination-free tasks with objective scoring across diverse domains.
  • LMSYS Chatbot Arena: Uses crowd-sourced human preferences (Elo ratings) to rank models based on conversation quality.

Key Data Points:

  • Performance Metrics (LiveBench): Includes 'Global Average' score and specific capability scores like 'Reasoning', 'Coding', 'Mathematics', etc. Higher is generally better.
  • Community Stats (LMSYS): Features 'Arena Score' (Elo rating) and corresponding Ranks. Higher scores/lower ranks are better.
  • Model Details: Provides information like Organization, License, and Knowledge Cutoff date.

How to Use This App:

  • Filter & Search: Use the controls above the tabs to search for models or filter by organization and minimum 'Global Average' score.
  • Explore Tabs: View different slices of the data (Performance, Details, Community Stats, Mapping).
  • View Model Card: Click on any row in the tables (except in the Visualizations tab) to see a detailed card with all metrics for that model.
  • Visualize & Compare: Use the 'Visualizations' tab to compare top models on specific metrics (Bar Chart) or compare selected models across multiple dimensions (Radar Chart).
Organization
0 100

Click a row for details.