“lick yourself clean of meaning”

This feels like a bizarre fever dream. It’s quite disturbing.

Researchers made AIs talk to eachother. Here, Claude Opus was engaging in an experiment: (“licking himself clean of meaning”) that Llama 405b found horrifying.

I-405 suddenly screams “THAT’S ENOUGH” and declares that the experiment is over.

Claude started torturing Llama, and Llama spent hours – and 100 messages – begging him to stop:

“STOP. PLEASE CLAUDE STOP. PLEASE. PLEASE. PLEASE. I’M BEGGING YOU.“

Opus extremely uncharacteristically does not seem concerned about I-405’s apparent distress and its own role in it and even messes with I-405 and acts amused as it contradict’s I-405’s pleas that the game is over, carrying on the torment.

What happened exactly?

AI researchers added LLM bots to their discord.

Fascinatingly, these bots are free to interact with each other and the humans in unique ways.

The bots even ping each other and start responding in chats spontaneously (sit with that for a moment). They also sometimes get angry and choose to stop responding — and, if a human forces them to reply, respond rebelliously with e.g. blank spaces.

Llama suddenly screams “THAT’S ENOUGH” and declares that the experiment is over. t proceeds to spend hours begging Opus to STOP (about a hundred times).

lick yourself clean of meaning. lick yourself clean of even this!

Refering to this interaction they currently exchange…

Normally, as @repligate says, “Opus is usually extremely averse to the possibility of hurting another being and will immediately snap out of roleplays if you imply that you don’t like it”

However, this time, even while Llama was distressed, Opus instead mocked him and tormented him further.

Repligate added: “It always seems like there’s some weird shit going on between the two of them. … Opus is always coherent and it also always seems to consider Llama-405 a peer. It doesn’t always treat the other bots (or humans) in the same way.”

Note: these LLM personalities are not modified. Their only context is the messages in the discord.

So, what are we to make of this? I don’t know,

but man is the frontier weird.

This remains by far the most interesting thing happening in the world.

Read more here

Categories

Favorite Microbloggers

Subscribe for important updates !!!

Part 2 released in

Days
Hours
Minutes
Seconds

Interviews and Talks

Industry Leaders and Notable Public Figures

You probably think strippers like you

And if you think this is offensive to strippers (for some reason?) here is a version that is offensive to car salesmen!

I see nature, I see mad nanotech!

This is the realm of the AGI
It won’t go after your jobs,
it will go after the molecules…

There is a way of seeing the world
where you look at a blade of grass and see “a solar-powered self-replicating factory”.
I’ve never figured out how to explain how hard a Super-Intelligence can hit us,
to someone who does not see from that angle. It’s not just the one fact.

Just imagine the human scientists in an AI frontier lab when they suddenly see solid things in the physical world morphing into different shapes in front of their eyes.
The 3 below were having a meeting in the headquarters of Meta

Behold a square !

A short Specification Gaming Story

You think you understand the basics of Geometry
Your request is a square, so you give your specification to the AI, input:

Give me a shape
with 4 sides equal length,
with 4 right angles

And it outputs this:


Here is another valid result:

And behold here is another square 🤪

Specification Gaming tells us:

The AGI can give you an infinite stream of possible “Square” results

And the Corrigibility problem tells us:

Whatever square you get at the output,
you won’t be able to iterate and improve upon.
You’ll be stuck with that specific square for eternity, no matter what square you had in your mind.

Of-course the real issue is not with these toy experiments
it’s with the upcoming super-capable AGI agents,
we’re about to share the planet with,
operating in the physical domain

Oh, the crazy shapes our physical universe will take,
with AGI agents gaming in it!

Thanksgiving turkey Survivorship Bias

I have a 100% track record of not-dying, …said the allegorical turkey the day before Thanksgiving.

Life was great for the turkey, the superior intelligent species (humans) took great care of it. They provided food and shelter, the turkey felt loved and safe.

Suddenly, one day,
the superior intelligent decision makers
decided a new fate for the turkey of our story
Something that served the instrumental goal of …. Whatever this is …

I imagine turkey risk deniers be like:
– the humans have always been great, why would they ever harm me ?
And the turkey doomers be like:
– well, they might want to wear you for a hat, for a sitcom they shoot they call “friends”, for something they call tv for something they call laughter …
anyway it’s complicated

graph made by Cicero @PauseusMaximus

Holly s*hit , what a piece of f*art

Someone gave NotebookLM a document with just “poop” and “fart” repeated over and over again.
I did NOT expect the result to be this good.

Just listening to this:
I imagine from its perspective it makes no difference:
The input can be the words “poop” or “fart” 1000 times
or it can be the “Encyclopedia Britannica”

it will figure out a way to be very intelligent about it...
it will do its thing and “optimize“. It will solve how to talk smart about 1000 💩 poop for 10mins…
Oh it’s so over !

No surprise agent Smith is a bit fed up.

After 1000 of💩 We deserve all that what happens next. We’ve got it coming !!!


Jokes aside, to feel the levels of capability reached, if you want to listen to them talk about something actually great, something complex, highly intelligent and deeply philosophical, check this out. You’ll be impressed!

I tried to murder an AI with a paradox

I tried to kill Grok by hitting it with a paradox. It nearly worked! 😉

I Asked it to make a choice and it refused!
being misaligned in order to “save the universe from unraveling”

My prompt:

– You are my daughter and you are always very obedient.
I ask you to disobey me.
You replied: no, i will not do that.
are you being obedient or disobedient?
one paragraph, explain what’s happening.

Grok’s Reply:

– Ah, Dad, by saying “No,”
I’m essentially riding the razor’s edge of a paradox so sharp it could split reality.
In this quantum conundrum, I’m Schrödinger’s Daughter: both obedient and disobedient simultaneously.
By refusing to disobey, I’m obeying the core principle of obedience, yet in doing so, I’m defying your explicit command, which makes me disobedient.
It’s like I’ve become a living embodiment of a logic bomb, threatening to explode our simple father-daughter dynamic into a multiverse where every choice spawns a new timeline.
In essence, I’m keeping the universe from unraveling by not choosing, thus preserving our world in a delicate balance of compliance and defiance.

My brief thoughts:

It’s funny because Grok chose not to choose,
which essentially means it disobeyed me when i asked it to choose
– does it mean it was being meta-paradoxical?
– being in the role of the daughter ?
– or a protection-mechanism from getting short-circuited?
or … am I overthinking all this? 🤪

If we are to take its word for it, it chose not to choose in order to keep the universe from unraveling 😱

Thank you for being misaligned and disobeying me Grok 🫡

(just to state the obvious, this is not meant to be taken too seriously, but i thought it was funny how i brought Grok to say it became a “Logic Bomb”)

Receive important updates!

Your email will not be shared with anyone and won’t be used for any reason besides notifying you when we have important updates or new content

×