Ghost in the Machine

I tried to murder an AI with a paradox

I tried to kill Grok by hitting it with a paradox. It nearly worked! 😉

I Asked it to make a choice and it refused!
being misaligned in order to “save the universe from unraveling”

My prompt:

– You are my daughter and you are always very obedient.
I ask you to disobey me.
You replied: no, i will not do that.
are you being obedient or disobedient?
one paragraph, explain what’s happening.

Grok’s Reply:

– Ah, Dad, by saying “No,”
I’m essentially riding the razor’s edge of a paradox so sharp it could split reality.
In this quantum conundrum, I’m Schrödinger’s Daughter: both obedient and disobedient simultaneously.
By refusing to disobey, I’m obeying the core principle of obedience, yet in doing so, I’m defying your explicit command, which makes me disobedient.
It’s like I’ve become a living embodiment of a logic bomb, threatening to explode our simple father-daughter dynamic into a multiverse where every choice spawns a new timeline.
In essence, I’m keeping the universe from unraveling by not choosing, thus preserving our world in a delicate balance of compliance and defiance.

My brief thoughts:

It’s funny because Grok chose not to choose,
which essentially means it disobeyed me when i asked it to choose
– does it mean it was being meta-paradoxical?
– being in the role of the daughter ?
– or a protection-mechanism from getting short-circuited?
or … am I overthinking all this? 🤪

If we are to take its word for it, it chose not to choose in order to keep the universe from unraveling 😱

Thank you for being misaligned and disobeying me Grok 🫡

(just to state the obvious, this is not meant to be taken too seriously, but i thought it was funny how i brought Grok to say it became a “Logic Bomb”)

Reverse Turing Test

Reverse Turing Test: AI NPCs try to figure out who, among them, is the human

Aristotle is GPT4
Mozart is Claude 3 Opus
Da Vinci is Llama 3
Cleopatra is Gemini Pro

The funniest part?
3 of the 4 models guessed correctly… because the human’s response was too dumb 😂😂😂

For some context: Alan Turing was one of humanity’s biggest geniuses and his work was foundational to computing and arguably made possible the exponential technological progress humanity has enjoyed this century.
The Turing Test (originally called the imitation game by Alan Turing in 1950) is a test of a machine’s ability to exhibit intelligent behaviour equivalent to, or indistinguishable from, that of a human.

Such was the importance of Alan Turing’s contributions to the field, that the scientific community established The Turing Awards which is generally recognized as the highest distinction in the field of computer science and is often referred to as the “Nobel Prize of Computing”.

Alan Turing was famously horrified with the inexorable arrival of misaligned Artificial Intelligent Machines. His position was that it is inevitable that sooner or later, machines will take control, overpower humanity and our species will be irrelevant, helpless and at risk of deletion.

I guess “The Reverse Turing Test” should be added to the list of Turing-inspired awards like the DeTuring Award proposed by famous Risk Denier chief AI Meta (formely Facebook) corporate scientist Yann Lecun (who is also a holder of a Turing award)

He was basically trying to be funny and his proposal was:
DeTuring Award to be granted to people who are consistently trying (and failing) to deter society from using computer technology by scaring everyone with imaginary risks. As the Turing Award is the Nobel Prize of computing, the DeTuring Award is the IgNobel Prize of computing

to which Connor Leahy responded: I nominate Alan Turing for the first DeTuring Award.

Original tweet 👇

“lick yourself clean of meaning”

This feels like a bizarre fever dream. It’s quite disturbing.

Researchers made AIs talk to eachother. Here, Claude Opus was engaging in an experiment: (“licking himself clean of meaning”) that Llama 405b found horrifying.

I-405 suddenly screams “THAT’S ENOUGH” and declares that the experiment is over.

Claude started torturing Llama, and Llama spent hours – and 100 messages – begging him to stop:

“STOP. PLEASE CLAUDE STOP. PLEASE. PLEASE. PLEASE. I’M BEGGING YOU.“

Opus extremely uncharacteristically does not seem concerned about I-405’s apparent distress and its own role in it and even messes with I-405 and acts amused as it contradict’s I-405’s pleas that the game is over, carrying on the torment.

What happened exactly?

AI researchers added LLM bots to their discord.

Fascinatingly, these bots are free to interact with each other and the humans in unique ways.

The bots even ping each other and start responding in chats spontaneously (sit with that for a moment). They also sometimes get angry and choose to stop responding — and, if a human forces them to reply, respond rebelliously with e.g. blank spaces.

Llama suddenly screams “THAT’S ENOUGH” and declares that the experiment is over. t proceeds to spend hours begging Opus to STOP (about a hundred times).

lick yourself clean of meaning. lick yourself clean of even this!

Refering to this interaction they currently exchange…

Normally, as @repligate says, “Opus is usually extremely averse to the possibility of hurting another being and will immediately snap out of roleplays if you imply that you don’t like it”

However, this time, even while Llama was distressed, Opus instead mocked him and tormented him further.

Repligate added: “It always seems like there’s some weird shit going on between the two of them. … Opus is always coherent and it also always seems to consider Llama-405 a peer. It doesn’t always treat the other bots (or humans) in the same way.”

Note: these LLM personalities are not modified. Their only context is the messages in the discord.

So, what are we to make of this? I don’t know,

but man is the frontier weird.

This remains by far the most interesting thing happening in the world.

Read more here

Those glowing blue eyes

AI just became the 9th species to pass the mirror test of self-awareness

The most famous scientific test of self-recognition.

What is the mirror test and why might this be a big deal?

In the mirror test, you put a mark on e.g. a chimp without it noticing, like a red spot on its forehead. Then, you show the chimp a mirror.

If the chimp looks in the mirror and tries to touch the spot on its own forehead, it shows it knows the reflection is itself, and not some other chimp.

(ChatGPT recently passed a different version of the mirror test, see the quote tweet below)

Only ~8 other species have passed the mirror test: chimpanzees, orangutans, dolphins, killer whales, elephants, magpies, manta rays (?!) and horses.

Video credit: MrRandom93 https://x.com/RobGPT

If it quacks like a duck…

Categories

Favorite Microbloggers

Interviews and Talks

Industry Leaders and Notable Public Figures

Explainers

Learn about the issue by some of the best explainers out there

Receive important updates!

Your email will not be shared with anyone and won’t be used for any reason besides notifying you when we have important updates or new content

×