🏆 CriticBench Subjective Leaderboard 🏆

CriticBench comprehensively evaluates 4 critique dimensions of LLMs on 9 widely-used tasks with multiple response qualities.

description Paper code Code description Project Page description Objective Leaderboard

CriticBench Subjective Scores

📝 Notes

Models labeled with 🌍 are API-Based models, while others are open-sourced.
Some models are not optimized for correction and comparison critique dimensions, like Auto-J-13B and UltraCM-13B. Their scores are not recorded, and the overall scores are the average of other dimensions.

This webpage template was recycled from here.