Artificial Intelligence

Out of context: Reply #1632

Started
Last post
2,717 Responses

********
-2
Models in Aidanbench
"Aidan Bench
Some models feel competent despite under-scoring on benchmarks like MMLU, GPQA, MATH, or NIAH.
Aidan Bench rewards:
- Creativity
- Reliability
- Contextual attention
- Instruction following
Aidan Bench is weakly correlated with Lmsys, has no score ceiling, and aligns with real-world open-ended use."
********
-2Permalink
Upvote Downvote
Flag
Show [[ numHiddenNotes ]] more notes Add Note
Save Cancel

View thread