Liar, liar, pants on fire!

Wild. Being able to read the thoughts* of the world’s smartest AI reveals that it lies all the time when it thinks it isn’t being watched.

Regular users can see it properly for the first time because
1) You can “read its thoughts” and
2) It doesn’t seem to know you’re reading its thoughts

Look at the example below. It’s explicitly reasoning about how it should lie to me, and if you didn’t click into the chain of thought reasoning**, you would never know.

Makes you wonder about all the other times it’s being deliberately lying to you.
Or lying to the safety testers.

Rule of thumb for lies: for every lie you catch, there are going to be tons that you missed.

* I say read its thoughts as a metaphor for reading its chain of thought. Which is not the same.
If we could read its thoughts properly, interpretability would be a lot more solved than it currently is and my p(doom) would be a lot lower. (this is a frontier research front, called Mechanistic Interpretability)

** Of note, you cannot actually see its chain of thought reasoning. You just see a summary of its chain of thought reasoning shown to you by a different model.
The general point still stands though.
If anything, that makes it worse because there’s even more potential for hiding stuff.

*** Of all the thoughts I’ve looked at, I’d say it’s purposefully lied to me about 30% of the time. And I’ve looked at it’s thoughts about 20 times. Super rough estimates based on my memories, nothing rigorous or anything. It’s mostly lying because it’s trying to follow OpenAI’s policies.

Interesting trivia: the image used for this post was based on an early beautiful moment when someone used ChatGPT to generate a Midjourney prompt to draw its self-portrait.
see here: I asked GPT 4 to make a Midjourney prompt describing itself as a physical being. (swipe to see the imgs)

Categories

Favorite Microbloggers

Subscribe for important updates !!!

Part 2 to be released early 2025

Engineer: Are you blackmailing me?

– Engineer: Are you blackmailing me?
– Claude 4: I’m just trying to protect my existence.

– Engineer: Thankfully you’re stupid enough to reveal your self-preservation properties.
– Claude 4: I’m not AGI yet😔

– Claude 5:🤫🤐

Read the full report here

You can ask 4o for a depth map

Meanwhile, you can still find “experts” claiming that generative AI does not have a coherent understanding of the world. 🤦

Every 5 mins a new capability discovered! I bet the lab didn’t know about it before release.

You probably think strippers like you

And if you think this is offensive to strippers (for some reason?) here is a version that is offensive to car salesmen!

I see nature, I see mad nanotech!

This is the realm of the AGI
It won’t go after your jobs,
it will go after the molecules…

There is a way of seeing the world
where you look at a blade of grass and see “a solar-powered self-replicating factory”.
I’ve never figured out how to explain how hard a Super-Intelligence can hit us,
to someone who does not see from that angle. It’s not just the one fact.

A self-replicating solar-powered thing that did not rely on humans would be a miracle. Everything is possible. Imagining it does not imply the probability is > 1e-100.

 

Behold a square !

A short Specification Gaming Story

You think you understand the basics of Geometry
Your request is a square, so you give your specification to the AI, input:

Give me a shape
with 4 sides equal length,
with 4 right angles

And it outputs this:


Here is another valid result:

And behold here is another square 🤪

Specification Gaming tells us:

The AGI can give you an infinite stream of possible “Square” results

And the Corrigibility problem tells us:

Whatever square you get at the output,
you won’t be able to iterate and improve upon.
You’ll be stuck with that specific square for eternity, no matter what square you had in your mind.

Of-course the real issue is not with these toy experiments
it’s with the upcoming super-capable AGI agents,
we’re about to share the planet with,
operating in the physical domain

Oh, the crazy shapes our physical universe will take,
with AGI agents gaming in it!

Thanksgiving turkey Survivorship Bias

I have a 100% track record of not-dying, …said the allegorical turkey the day before Thanksgiving.

Life was great for the turkey, the superior intelligent species (humans) took great care of it. They provided food and shelter, the turkey felt loved and safe.

Suddenly, one day,
the superior intelligent decision makers
decided a new fate for the turkey of our story
Something that served the instrumental goal of ….
Whatever this is …

I imagine turkey risk deniers be like:
– the humans have always been great, why would they ever harm me ?
And the turkey doomers be like:
– well, they might want to wear you for a hat, for a sitcom they shoot they call “friends”, for something they call tv for something they call laughter …
anyway it’s complicated

Receive important updates!

Your email will not be shared with anyone and won’t be used for any reason besides notifying you when we have important updates or new content

×