Artificial Intelligence

Out of context: Reply #2437

  • Started
  • Last post
  • 2,651 Responses
  • yuekit0

    For all its discourse-shaping power, METR’s benchmark is ridden with methodological shortcomings, ranging from testing models against contrived, unrealistic software engineering tasks — something its authors have, to their credit, been upfront about — to staking their results on an exceedingly small, biased sample of their peers — something they have been less upfront about. Its memetic energy is a concerning sign of how easily flawed research can win even expert assent, so long as it provides a sheen of rigor to a widely accepted narrative.

    https://www.transformernews.ai/p…

View thread