Artificial Intelligence

Out of context: Reply #2437

Started
Last post
2,651 Responses

yuekit0
For all its discourse-shaping power, METR’s benchmark is ridden with methodological shortcomings, ranging from testing models against contrived, unrealistic software engineering tasks — something its authors have, to their credit, been upfront about — to staking their results on an exceedingly small, biased sample of their peers — something they have been less upfront about. Its memetic energy is a concerning sign of how easily flawed research can win even expert assent, so long as it provides a sheen of rigor to a widely accepted narrative.
https://www.transformernews.ai/p…
yuekit 0Permalink
Upvote Downvote
Flag
Show [[ numHiddenNotes ]] more notes Add Note
Save Cancel

View thread