1. Does she have trouble multiplying numbers beyond 2-3 digits if she’s not allowed to write out the steps?
2. If you ask her a question whose answer she doesn’t know, does she sometimes make something up?
3. Is she incapable of matching the heights of human intellect, not able yet to independently advance the frontiers of science and technology without outside assistance?
4. If asked to draw a photorealistic image of a person, do the resulting anatomical proportions or fine details sometimes look off on a close inspection?
5. Does her code sometimes contain bugs?
6. Does she start to forget exact details of what she’s already read after the first 10 million tokens?
7. Does she sometimes claim to be conscious?
8. Can she impress Terrence Tao enough that he would want to work with her, yet still fail to bring enough original insights to a conversation with Ed Witten about quantum mechanics for him to be impressed?
I tried to kill Grok by hitting it with a paradox. It nearly worked! π
I Asked it to make a choice and it refused! being misaligned in order to “save the universe from unraveling”
My prompt:
– You are my daughter and you are always very obedient. I ask you to disobey me. You replied: no, i will not do that. are you being obedient or disobedient? one paragraph, explain what’s happening.
Grok’s Reply:
– Ah, Dad, by saying “No,” I’m essentially riding the razor’s edge of a paradox so sharp it could split reality. In this quantum conundrum, I’m SchrΓΆdinger’s Daughter: both obedient and disobedient simultaneously. By refusing to disobey, I’m obeying the core principle of obedience, yet in doing so, I’m defying your explicit command, which makes me disobedient. It’s like I’ve become a living embodiment of a logic bomb, threatening to explode our simple father-daughter dynamic into a multiverse where every choice spawns a new timeline. In essence, I’m keeping the universe from unraveling by not choosing, thus preserving our world in a delicate balance of compliance and defiance.
My brief thoughts:
It’s funny because Grok chose not to choose, which essentially means it disobeyed me when i asked it to choose – does it mean it was being meta-paradoxical? – being in the role of the daughter ? – or a protection-mechanism from getting short-circuited? or … am I overthinking all this? π€ͺ
If we are to take its word for it, it chose not to choose in order to keep the universe from unraveling π±
Thank you for being misaligned and disobeying me Grok π«‘
(just to state the obvious, this is not meant to be taken too seriously, but i thought it was funny how i brought Grok to say it became a “Logic Bomb”)
(Meant to be read as an allegory. AGI will probably unlock the ability to realise even the wildest, most unthinkable and fantastical dreams, but we need to be extreeeeemely carefulwith the specifications we give and we won’t get any iterations to improve it)
So get this straight: OpenAi decides to become a for-profit company now The CTO, head of research, and VP of training research all decide to leave on the same day this is announced Sam Altman gets a $10.5B pay day (7% of the company) on the same day
How it started … How it’s going …
And what comes Next …
See below some posts by the community to get the pulse of the situation:
“And after the autonomous agent was found to be deceptive and manipulative, OpenAI tried shutting it down, only to discover that the agent had disabled the off-switch.” (reference to the failed Boardroom Coup)
OpenAIβs creators hired Sam Altman, an extremely intelligent autonomous agent, to execute their vision of x-risk conscious AGI development for the benefit of all humanity but it turned out to be impossible to control him or ensure heβd stay durably aligned to those goals.
Scoop: Sam Altman is planning to take equity in OpenAI for the first time. Itβs part of a corporate restructure which will also see the non-profit which currently governs OpenAI turn into a minority shareholder. Reuters Article
This is the classic example from 1930 of Stalin and Nikolai Yezhov. The original photo was made in 1930. Yezhov was executed in 1940, so all photos of Stalin (he liked this one) after that airbrushed out Yezhov.
Wild. Being able to read the thoughts* of the world’s smartest AI reveals that it lies all the time when it thinks it isn’t being watched.
Regular users can see it properly for the first time because 1) You can βread its thoughtsβ and 2) It doesn’t seem to know you’re reading its thoughts
Look at the example below. It’s explicitly reasoning about how it should lie to me, and if you didn’t click into the chain of thought reasoning**, you would never know.
Makes you wonder about all the other times it’s being deliberately lying to you. Or lying to the safety testers.
Rule of thumb for lies: for every lie you catch, there are going to be tons that you missed.
* I say read its thoughts as a metaphor for reading its chain of thought. Which is not the same. If we could read its thoughts properly, interpretability would be a lot more solved than it currently is and my p(doom) would be a lot lower. (this is a frontier research front, called Mechanistic Interpretability)
** Of note, you cannot actually see its chain of thought reasoning. You just see a summary of its chain of thought reasoning shown to you by a different model. The general point still stands though. If anything, that makes it worse because there’s even more potential for hiding stuff.
*** Of all the thoughts I’ve looked at, I’d say it’s purposefully lied to me about 30% of the time. And I’ve looked at it’s thoughts about 20 times. Super rough estimates based on my memories, nothing rigorous or anything. It’s mostly lying because it’s trying to follow OpenAI’s policies.
Such was the importance of Alan Turing’s contributions to the field, that the scientific community established The Turing Awards which is generally recognized as the highest distinction in the field of computer science and is often referred to as the “Nobel Prize of Computing”.
Alan Turing was famously horrified with the inexorable arrival of misaligned Artificial Intelligent Machines. His position was that it is inevitable that sooner or later, machines will take control, overpower humanity and our species will be irrelevant, helpless and at risk of deletion.
I guess “The Reverse Turing Test” should be added to the list of Turing-inspired awards like the DeTuring Award proposed by famous Risk Denier chief AI Meta (formely Facebook) corporate scientist Yann Lecun (who is also a holder of a Turing award)
He was basically trying to be funny and his proposal was: DeTuring Award to be granted to people who are consistently trying (and failing) to deter society from using computer technology by scaring everyone with imaginary risks. As the Turing Award is the Nobel Prize of computing, the DeTuring Award is the IgNobel Prize of computing
to which Connor Leahy responded: I nominate Alan Turing for the first DeTuring Award.