LLM Leaderboard Explorer
Interactive visualization of merged leaderboards data
Welcome!
This application provides an interactive view of combined data from two leading LLM evaluation platforms:
- LiveBench: LiveBench is a dynamic benchmark, featuring monthly updated, contamination-free tasks with objective scoring across diverse domains.
- LMSYS Chatbot Arena: Uses crowd-sourced human preferences (Elo ratings) to rank models based on conversation quality.
Key Data Points:
- Performance Metrics (LiveBench): Includes 'Global Average' score and specific capability scores like 'Reasoning', 'Coding', 'Mathematics', etc. Higher is generally better.
- Community Stats (LMSYS): Features 'Arena Score' (Elo rating) and corresponding Ranks. Higher scores/lower ranks are better.
- Model Details: Provides information like Organization, License, and Knowledge Cutoff date.
How to Use This App:
- Filter & Search: Use the controls above the tabs to search for models or filter by organization and minimum 'Global Average' score.
- Explore Tabs: View different slices of the data (Performance, Details, Community Stats, Mapping).
- View Model Card: Click on any row in the tables (except in the Visualizations tab) to see a detailed card with all metrics for that model.
- Visualize & Compare: Use the 'Visualizations' tab to compare top models on specific metrics (Bar Chart) or compare selected models across multiple dimensions (Radar Chart).
Organization
0 100
Click a row for details.
Click a row for details.
Click a row for details.
Click a row for details.
Compare Top Models by Metric
Select Metric for Bar Chart