Artificial Intelligence
Artificial Intelligence
Out of context: Reply #2437
- Started
- Last post
- 2,651 Responses
- yuekit0
For all its discourse-shaping power, METR’s benchmark is ridden with methodological shortcomings, ranging from testing models against contrived, unrealistic software engineering tasks — something its authors have, to their credit, been upfront about — to staking their results on an exceedingly small, biased sample of their peers — something they have been less upfront about. Its memetic energy is a concerning sign of how easily flawed research can win even expert assent, so long as it provides a sheen of rigor to a widely accepted narrative.
