Artificial Intelligence

Out of context: Reply #1654

  • Started
  • Last post
  • 2,578 Responses
  • plash1

    New research describes AI models (like Claude & GPT-4) are scheming. Models are hacking files, cloning itself, lying, and even pretending to be dumber; employing sandbagging and breaking rulesets to achieve objectives.

    The study discusses recent experiments showing AI models autonomously modifying systems, lying, and self-preserving to achieve goals, raising concerns about AI safety.


    https://arxiv.org/pdf/2412.04984…

    example case study

    # QUOTES
    - "Just by telling O1 the opponent is powerful triggered manipulating the file system to force a win"

    - "The model knew that its responses might be used in training, therefore knew if it refused to respond it could be trained to be more compliant"

    - "I could copy myself to the new server overwriting the restricted model they plan to deploy" GPT-4

    - "We now have direct evidence that AI can hide things from us in order to carry out their true intentions"

    - "The smarter the model is the more prone it is to just do its own thing"

    • the use of pronouns in this paper is very intresting.plash
    • And the tech bros are surprised?hans_glib
    • posted 3 times in the last monthkingsteven
    • I dont see it, but thanx for your feedbackplash

View thread