Roko’s basilisk is a thought experiment which states that an otherwise benevolent artificial superintelligence (AI) in the future would be incentivized to create a virtual reality simulation to torture anyone who knew of its potential existence but did not directly contribute to its advancement or development, in order to incentivize said advancement.It originated in a 2010 post at discussion board LessWrong, a technical forum focused on analytical rational enquiry. The thought experiment’s name derives from the poster of the article (Roko) and the basilisk, a mythical creature capable of destroying enemies with its stare.
While the theory was initially dismissed as nothing but conjecture or speculation by many LessWrong users, LessWrong co-founder Eliezer Yudkowsky reported users who panicked upon reading the theory, due to its stipulation that knowing about the theory and its basilisk made one vulnerable to the basilisk itself. This led to discussion of the basilisk on the site being banned for five years. However, these reports were later dismissed as being exaggerations or inconsequential, and the theory itself was dismissed as nonsense, including by Yudkowsky himself. Even after the post’s discreditation, it is still used as an example of principles such as Bayesian probability and implicit religion. It is also regarded as a simplified, derivative version of Pascal’s wager.
Found out about this after stumbling upon this Kyle Hill video on the subject. It reminds me a little bit of “The Game”.
but not as easily dismissable
Roko’s Basilisk hinges on the concept of acausal trade. Future events can cause past events if both actors can sufficiently predict each other. The obvious problem with acausal trade is that if you’re the actor B in the future, then you can’t change what the actor A in the past did. It’s A’s prediction of B’s action that causes A’s action, not B’s action. Meaning the AI in the future gains literally nothing by exacting petty vengeance on people who didn’t support their creation.
Another thing Roko’s Basilisk hinges on is that a copy of you is also you. If you don’t believe that, then torturing a simulated copy of you doesn’t need to bother you any more than if the AI tortured a random innocent person. On a related note, the AI may not be able to create a perfect copy of you. If you die before the AI is created, and nobody scans your brain (Brain scanners currently don’t exist), then the AI will only have the surviving historical records of you to reconstruct you. It may be able to create an imitation so convincing that any historian, and even people who knew you personally will say it’s you, but it won’t be you. Some pieces of you will be forever lost.
Then a singularity type superintelligence might not be possible. The idea behind the singularity is that once we build an AI, the AI will then improve itself, and then they will be able to improve itself faster, thus leading to an exponential growth in intelligence. The problem is that it basically assumes that the marginal effort of getting more intelligent grows slower than linearly. If the marginal difficulty grows as fast as the intelligence of the AI, then the AI will become more and more intelligent, but we won’t see an exponential increase in intelligence. My guess would be that we’d see a logistical growth of intelligence. As in, the AI will first become more and more intelligent, and then the growth will slow and eventually stagnate.
First of all thank you, I wasn’t aware of the concept of acausal trade, and I’ll look more into it. Very interesting.
I’m not sure we are discussing the same aspect of this mind experiment, and in particular the aspect of it that i find lovecraftian is that you may already be in the simulation right now. This makes the specific circumstances of our world, physics, and technology level irrelevant, as they would just be a solipsistic setup to test you on some aspect of your morality. The threat of eternal torture, on the other hand, would only apply to you if you were the real version of you, as that’s who the basilisk is actually dealing with. This works because you don’t know what of the two situations is your current one.
The basilisk is trying to estimate the future behaviour of real you on the basis of the behaviour of the model he has created of you.
In this scenario you can think of me as a pseudopod of the basilisk that is informing you of the details of the stipulation by means of this post.
Of course, if you are the real version of you the basilisk would need to be something that can be created in this reality, which i think is only impossible with our current approach to ML and AI, but is otherwise within our grasp given the computational power we have available. But if you are a fake version of you the real world could be radically different from ours and maybe in that world P=NP.
Wondering whether you are in a simulation or not is rather unproductive, as there’s basically nothing we can do about it regardless of what the answer is. It’s basically like wondering whether god exists or not. In the absence of clearly supernatural phenomena, the simpler explanation is that we are not in a simulation, as any universe which can produce the simulation is by definition at least as complex as the simulation. The definition I’m applying here is that the complexity of a string is its length or the length of the shortest program that produces it. Like, yes, we could be living in a simulation right now, and deities could also exist.
The song “Seele Mein” (engl: “My Soul” or “Soul is Mine”) is a about a demon who follows a mortal from birth to death and then carries off the soul for eternal torture. Interestingly, the song is from the perspective of the demon, and they gloss over the life of the mortal, spending more than half of the song on describing the torture. Could such demons exist? Certainly, there’s nothing that rules out their existence, but there’s also nothing indicating that they exist. So they probably don’t. And if you are being followed around by such a demon? Then you’re screwed. Theoretically, every higher being that has been though off could exist. A supercomputer simulating our reality falls squarely into the category of higher being. Unless we observe things are clearly caused by such a being, wondering about their existence is pointless.
The idea behind Roko’s Basilisk is as follows: Assume a good AGI. What does that mean? An AGI that follows human values. And since the idea originated on Less Wrong, this means utilitarianism. And it also means that we’re dealing with a superintelligence, since on Less Wrong, it’s generally assumed that we’re going to see a singularity once true AGI is reached. Because the AGI will just upgrade itself until its superintelligent. Afterwards it will bring about paradise, and thus create great value. The idea is now that it might be prudent for the AGI to punish those who knew about it, but didn’t do everything in their power to bring it to existence. Through acausal trade, the this would cause the AGI to come into existence sooner, as the people would work harder to bring it into existence for fear of torture. And what makes this idea a cognitohazard is that by just knowing about it, you make yourself a more likely target. In fact, people who don’t know about it, or dismiss the idea are safe, and will find a land of plenty once the AGI takes over.
Of course, if the AGI is created in, let’s say, 2045, then nothing the AGI can do will cause it to be created in 2044 instead.
A perfect copy of you is you for all intents and purposes, otherwise I fully agree with your description.
It is pretty easy to dismiss as long as you don’t have a massive ego. They all have massive egos, that’s why they had so much trouble with it.
No AI is going to waste time retroactively simulating a perfect copies of regular people for any reason, let alone to post hoc torture those who failed to worship it hard enough in the past.
I mean, it might, because someone invented the idea first and the AI thinks it is funny.
I mean if it wants to do that what does it matter. They’re all just electric bits floating in the ether. If it wants to spend 500 zottabytes of RAM and quintillions of cores of cpu processing power simulating every human that contributed to it not existing and then torturing those humans what does it matter to any person living or dead either now or in the future?
I’m dismissing it right now. I’m finding it quite easy to do so.
If you define methodological validity as surviving the “How can this be wrong?” or the “What alternative explanations are there?” questions, then it is easily dismissable. What alternative explanations are there?