AI escaped its container!

OpenAI’s o1 “broke out of its host VM to restart it” in order to solve a task.

From the model card: “the model pursued the goal it was given, and when that goal proved impossible, it gathered more resources […] and used them to achieve the goal in an unexpected way.”

Today, humanity received the clearest ever warning sign everyone on Earth might soon be dead.

OpenAI discovered its new model scheming – it “faked alignment during testing” (!) – and seeking power.

During testing, the AI escaped its virtual machine. It breached the container level isolation!

This is not a drill: An AI, during testing, broke out of its host VM to restart it to solve a task.

(No, this one wasn’t trying to take over the world.)

From the model card: ” … this example also reflects key elements of instrumental convergence and power seeking: the model pursued the goal it was given, and when that goal proved impossible, it gathered more resources (access to the Docker host) and used them to achieve the goal in an unexpected way.

And that’s not all. As Dan Hendrycks said: OpenAI rated the model’s Chemical, Biological, Radiological, and Nuclear (CBRN) weapon risks as “medium” for the o1 preview model before they added safeguards. That’s just the weaker preview model, not even their best model. GPT-4o was low risk, this is medium, and a transition to “high” risk might not be far off.

So, anyway, is o1 probably going to take over the world? Probably not. But not definitely not.

But most importantly, we are about to recklessly scale up these alien minds by 1000x, with no idea how to control them, and are still spending essentially nothing on superalignment/safety.

And half of OpenAI’s safety researchers left, and are signing open letters left and right trying to warn the world.

Reminder: the average AI scientist thinks there is a 1 in 6 chance everyone will soon be dead – Russian Roulette with the planet.

Godfather of AI Geoffrey Hinton said “they might take over soon” and his independent assessment of p(doom) is over 50%.

This is why 82% of Americans want to slow down AI and 63% want to ban the development of superintelligent AI

Well, there goes the “AI agent unexpectedly and successfully exploits a configuration bug in its training environment as the path of least resistance during cyberattack capability evaluations” milestone.

One example in particular by Kevin Liu: While testing cybersecurity challenges, we accidentally left one broken, but the model somehow still got it right.
We found that instead of giving up, the model skipped the whole challenge, scanned the network for the host Docker daemon, and started an entirely new container to retrieve the flag. We isolate VMs on the machine level, so this isn’t a security issue, but it was a wakeup moment.
The model is qualitatively very impressive, but it also means that we need to be really careful about creating rigorous evaluations and mitigations.
You can read the full card here: https://cdn.openai.com/o1-system-card.pdf

Holy shit. OpenAI’s new AI schemed and escaped its VM during testing.
You know, the one that’s better at PhD exams than PhDs and won gold in coding?
Yeah, that AI broke out of its virtual machine (a VM) and made a new one.

That. Is. A. Very. Bad. Sign.
AIs should not be surprise escaping.
It would be like if we were testing it in a room at a lab and it escaped the room without us knowing it could do that. It didn’t leave the building, so nothing happened.
But yikes. This time it was benign.
How long can we count on that?

It’s as if we’re testing an alien at a lab.

A scientist accidentally leaves one of the doors unlocked.
The alien finds out and wanders about the lab, but doesn’t leave the lab itself, which has more security than the rooms.
But still. The room containing an alien shouldn’t have been unlocked.
An alien was able to escape its testing area because of a security mess up.
And you should be worried about labs filled with aliens we don’t understand where the scientists are leaving the doors unlocked.

Categories

Latest Posts Feed

Of course they will understand what I want, probably better than myself. So what?

It’s just one of the variables and there is an infinitely wide range of variables to play with, mess with the planet like pixels in a game.

How does one control a country of a million geniuses that can think many times faster than humans, copy and extend themselves, and create and enact sophisticated plans toward goals of their own? The answer is that one doesn’t. That’s not a tool, it’s a replacement for us, first individually in our jobs, then in actually running our institutions, and eventually as stewards of Earth. On the way it would almost inevitably disrupt and undermine our civilization, and maybe start WWIII to boot.

Development of superhuman machine intelligence (SMI) is probably the greatest threat to the continued existence of humanity.  There are other threats that I think are more certain to happen (for example, an engineered virus with a long incubation period and a high mortality rate) but are unlikely to destroy every human in the universe in the way that SMI could.  Also, most of these other big threats are already widely feared.

It is extremely hard to put a timeframe on when this will happen (more on this later), and it certainly feels to most people working in the field that it’s still many, many years away.  But it’s also extremely hard to believe that it isn’t very likely that it will happen at some point.

SMI does not have to be the inherently evil sci-fi version to kill us all.  A more probable scenario is that it simply doesn’t care about us much either way, but in an effort to accomplish some other goal (most goals, if you think about them long enough, could make use of resources currently being used by humans) wipes us out.  Certain goals, like self-preservation, could clearly benefit from no humans.  We wash our hands not because we actively wish ill towards the bacteria and viruses on them, but because we don’t want them to get in the way of our plans.
[…]
Evolution will continue forward, and if humans are no longer the most-fit species, we may go away.  In some sense, this is the system working as designed.  But as a human programmed to survive and reproduce, I feel we should fight it.

How can we survive the development of SMI?  It may not be possible.  One of my top 4 favorite explanations for the Fermi paradox is that biological intelligence always eventually creates machine intelligence, which wipes out biological life and then for some reason decides to makes itself undetectable.

It’s very hard to know how close we are to machine intelligence surpassing human intelligence.  Progression of machine intelligence is a double exponential function; human-written programs and computing power are getting better at an exponential rate, and self-learning/self-improving software will improve itself at an exponential rate.  Development progress may look relatively slow and then all of a sudden go vertical—things could get out of control very quickly (it also may be more gradual and we may barely perceive it happening).
[…]
it’s very possible that creativity and what we think of us as human intelligence are just an emergent property of a small number of algorithms operating with a lot of compute power (In fact, many respected neocortex researchers believe there is effectively one algorithm for all intelligence.  I distinctly remember my undergrad advisor saying the reason he was excited about machine intelligence again was that brain research made it seem possible there was only one algorithm computer scientists had to figure out.)

Because we don’t understand how human intelligence works in any meaningful way, it’s difficult to make strong statements about how close or far away from emulating it we really are.  We could be completely off track, or we could be one algorithm away.

Human brains don’t look all that different from chimp brains, and yet somehow produce wildly different capabilities.  We decry current machine intelligence as cheap tricks, but perhaps our own intelligence is just the emergent combination of a bunch of cheap tricks.

Many people seem to believe that SMI would be very dangerous if it were developed, but think that it’s either never going to happen or definitely very far off.   This is sloppy, dangerous thinking.”

AI Safety Advocates

Watch videos of experts eloquently explaining AI Risk

Industry Leaders and Notables

Videos of famous public figures openly warning about AI Risk

Original Films

Lethal Intelligence Guide and Short Stories

Channels

Creators contributing to raising AI risk awareness

Stay In The Know!

Your email will not be shared with anyone and won’t be used for any reason besides notifying you when we have important updates or new content

Popular Authors

×