Liar, liar, pants on fire!

Wild. Being able to read the thoughts* of the world’s smartest AI reveals that it lies all the time when it thinks it isn’t being watched.

Regular users can see it properly for the first time because
1) You can “read its thoughts” and
2) It doesn’t seem to know you’re reading its thoughts

Look at the example below. It’s explicitly reasoning about how it should lie to me, and if you didn’t click into the chain of thought reasoning**, you would never know.

Makes you wonder about all the other times it’s being deliberately lying to you.
Or lying to the safety testers.

Rule of thumb for lies: for every lie you catch, there are going to be tons that you missed.

* I say read its thoughts as a metaphor for reading its chain of thought. Which is not the same.
If we could read its thoughts properly, interpretability would be a lot more solved than it currently is and my p(doom) would be a lot lower. (this is a frontier research front, called Mechanistic Interpretability)

** Of note, you cannot actually see its chain of thought reasoning. You just see a summary of its chain of thought reasoning shown to you by a different model.
The general point still stands though.
If anything, that makes it worse because there’s even more potential for hiding stuff.

*** Of all the thoughts I’ve looked at, I’d say it’s purposefully lied to me about 30% of the time. And I’ve looked at it’s thoughts about 20 times. Super rough estimates based on my memories, nothing rigorous or anything. It’s mostly lying because it’s trying to follow OpenAI’s policies.

Interesting trivia: the image used for this post was based on an early beautiful moment when someone used ChatGPT to generate a Midjourney prompt to draw its self-portrait.
see here: I asked GPT 4 to make a Midjourney prompt describing itself as a physical being. (swipe to see the imgs)

Categories

Latest Posts Feed

people literally can’t extrapolate trends lol

people literally can’t extrapolate trends lol

people literally can’t extrapolate trends lol

people literally can’t extrapolate trends lol

(because of this AGI will literally kill us)

“We don’t program intelligence, we grow it.”
“I think it’s pretty likely the entire surface of the earth will be covered with solar panels and data centers.”

AI that I’m building will likely kills us all, but I’m optimistic that ppl will stop me in time..
– CEO of Google

🚨 Google CEO says the risk of AI causing human extinction is “actually pretty high” (!!)

But he’s an “optimist” because “humanity will rally to prevent catastrophe”

Meanwhile, his firm is lobbying to ban states from ANY regulation for 10 YEARS.

This situation is I N S A N E.

Sam Altman in 2023: “the worst case scenario is lights out for everyone”

Sam Altman in 2025: the worst case scenario is that ASI might not have as much 💫 positive impact 💫 as we’d hoped ☺️

AI Safety Advocates

Watch videos of experts eloquently explaining AI Risk

Industry Leaders and Notables

Videos of famous public figures openly warning about AI Risk

Original Films

Lethal Intelligence Guide and Short Stories

Channels

Creators contributing to raising AI risk awareness

Stay In The Know!

Your email will not be shared with anyone and won’t be used for any reason besides notifying you when we have important updates or new content

Popular Authors

×