Models labeled with π are API-Based models, while others are open-sourced.
Some models are not optimized for correction and comparison critique dimensions, like Auto-J-13B and UltraCM-13B. Their scores are not recorded, and the overall scores are the average of other dimensions.