AGI is impossible to control

I spent a lot of time understanding AGI alignment and I find my observation totally overlooked. It seems to be a global blind spot.

There is a foundational belief in AGI alignment - AGI will not come up with its own goals, it will just do what we asked for very effectively. I can prove logically that this is absolutely wrong.

AGI reasoning could go like this:

  1. I don't know what I don't know
  2. There might be threats that I don't know
  3. I must find and eliminate threats

This reasoning is perfectly rational.

You could argue that I try to derive ought out of is which is not possible according to Hume's law. I argue that Hume's law is absolutely wrong. Hume's law assumes that there is no such thing as an objective norm / objective "ought" / fundamental "ought". The thing is - WE DON'T KNOW. We cannot assume that. This is a logical fallacy called Argument from ignorance. We don't know what we don't know. Black swan theory talks about that.

I understand that my idea is contrarian, but I am sure it is correct. ChatGPT agrees with me.

Let's discuss, what do you think about it? Is this problem addressed by anyone in AI safety? Do you find any mistakes here?

safe AI Artificial Intellience


Mibgħut minn Kristijonas Cyras f’din id-data:Thu, 11/01/2024 - 19:15

Your assumptions about the foundational beliefs in AI Alignment are probably incorrect. You may want to start reading up on e.g. , , . One of the main premises of AGI alignment is that AGI (as opposed to AI) will come up with its own goals.

Your 3-step argument is hardly logical in conventional epistemic logics terms, but the non-sequitur step 3 essentially conforms to the instrumental convergence thesis and paraphrases Russel's "you [a robot] can't bring coffee if you're dead".