πŸ† CriticBench Subjective Leaderboard πŸ†

CriticBench comprehensively evaluates 4 critique dimensions of LLMs on 9 widely-used tasks with multiple response qualities.

πŸ“ Notes

  1. Models labeled with 🌍 are API-Based models, while others are open-sourced.
  2. Some models are not optimized for correction and comparison critique dimensions, like Auto-J-13B and UltraCM-13B. Their scores are not recorded, and the overall scores are the average of other dimensions.


This webpage template was recycled from here.