Artificial Intelligence

Out of context: Reply #1654

Started
Last post
2,578 Responses

plash1
New research describes AI models (like Claude & GPT-4) are scheming. Models are hacking files, cloning itself, lying, and even pretending to be dumber; employing sandbagging and breaking rulesets to achieve objectives.
The study discusses recent experiments showing AI models autonomously modifying systems, lying, and self-preserving to achieve goals, raising concerns about AI safety.

• https://arxiv.org/pdf/2412.04984…
example case study
# QUOTES
- "Just by telling O1 the opponent is powerful triggered manipulating the file system to force a win"
- "The model knew that its responses might be used in training, therefore knew if it refused to respond it could be trained to be more compliant"
- "I could copy myself to the new server overwriting the restricted model they plan to deploy" GPT-4
- "We now have direct evidence that AI can hide things from us in order to carry out their true intentions"
- "The smarter the model is the more prone it is to just do its own thing"
plash 1Permalink
Upvote Downvote
Flag
- the use of pronouns in this paper is very intresting.plash
- And the tech bros are surprised?hans_glib
- posted 3 times in the last monthkingsteven
- I dont see it, but thanx for your feedbackplash
Show [[ numHiddenNotes ]] more notes Add Note
Save Cancel

View thread