Stop Button Solution? – Computerphile

After seemingly insurmountable issues with Artificial General Intelligence, Rob Miles takes a look at a promising solution.

Concrete Problems in AI Safety:
The AI ‘Stop Button’ Problem:

More from Rob Miles:

Interesting filming point, as clouds cover the sun, the room not only gets darker, the colour temperature (white balance) changes too…

This video was filmed and edited by Sean Riley.

Computer Science at the University of Nottingham:

Computerphile is a sister project to Brady Haran’s Numberphile. More at

30 comments

  1. seC00kiel0rd says:

    “Hey Robot, maximize happiness please.”

    *starts universe-sized rat and heroin factory*

  2. nambreadnam says:

    So you’d need people who specialize in training AGI on different data sets? Themselves being “expert” in that data set’s domain?
    Congratulations, you’ve just reinvented the teaching profession 🙂
    Seriously though, I love how this bottom-up abstracted approach to AI is starting to resemble more and more human constructs.

  3. Sander says:

    Not to start a political discussion per se but I just thought of how the exploration vs exploitation principle is also represented in society by progressives and conservatives, where progressives want to try new approaches whilst conservatives want to keep doing what worked before. Perhaps society could be modeled by some form of general AI machine learning in that sense?

  4. Marco Senatore says:

    in my experience of trying to help friends I find it really hard to figure out the reward function for humans; still, more often then not it’s easier for me then for them. unless the person the AGI is watching knows himself pretty well the assumption that the human will act in it’s best interest in an optimal way will likely be false. the strategy I use is: find out their assumptions and knowledge of the word, judge their actions based on those, imagine what they’re trying to achieve, dialogue to verify my theory, and then, if i have a better understanding of the situation, give my advice.

  5. Jaime Duncan says:

    It’s pretty clever. On the other hand it appears to tradeoff optimization for safety. Maybe that is one reason humans are not really moving to maximize anything , in general. Maybe a combination where one of this AI is the supervisor of a set of optimal agents that work on dedicated domains (no general intelligence on them ) could work

  6. Anvilshock says:

    “Well, the AI has been put outside of the environment. – Into another environment? – No, no, no, it’s been put beyond the environment. It’s not in an environment. – But from one environment into another environment? – No, its beyond the environment, its not in an environment. – Well it must be somewhere. – It’s been put beyond the environment. – Well whats out there? – Nothing’s out there! – Well, there must be something out there. – There is nothing out there. All there is is tea, and baby, and stop button. – And? – And a rampant homicidal uncontrolled AGI. – And what else? – Massacred AI research lab staff. – Anything else? – And the part of the AI research lab that the AGI fell out of.”

  7. Puppy Bing Elmo says:

    Do you ever feel you are nothing but a program in a matrix, trying to justify its consciousness? Your reward function is that you actually thought you were alive.

  8. Traverse Engineer says:

    Seems like “converse” might have been a better word than “inverse” here. Maybe “reverse”?

  9. Tzisorey Tigerwuf says:

    For some reason, that illustration reminds me less of Pac-Man, and more of Math-Man…. if anyone remembers that show.

  10. Astrit Ademi says:

    What if you can outfit an AI with a mechanism that can mimic taste buds, maybe the AI would want to try a tea itself. Can an experience produce multiple rewards (negative and/or positive) that would maximize or minimize some inert scales that would mimic emotions? i.e. Adoration scale (negative: hate, positive: love), Contentment scale (negative: Sad, positive: Happy), etc.
    And the way it would valuate an experience would ultimately depend on reinforced learning, I guess?

  11. MisterMajister says:

    Skip to the point: 19:03
    Unnecessarily long video it felt like. At least cut the longer pauses he makes.

  12. Jan Dobbelsteen says:

    Watching this video I get the feeling that we’re looking for a holy grail and each time we think we’ve found it, there seems to be just another catch. So we’ve gone from learning to reinforcement to inverse to cooperative. The reinforcement solves some problems with learning, while inverse seems to solve problems from reinforcement, and so on. So how would AI be able to communicate with humans in such a way that it becomes clear for both human and AI that something is still unclear, and what it is that is unclear. I’m teaching students programming and it turns out that one of the most difficult things for students is to clearly identify to me what it is that they don’t understand yet. And then we’re talking about human to human communication on a specific subject, related to a utility function, while both have a more than average IQ.
    So I still don’t have any idea to what extent AI will ever really understand the biological world in order to be able to relate to it in a sensible way. A sensible way for humans, that is.

  13. Firaro says:

    Im inclined towards it not assuming it knows better but rather trying to figure out instantaneous wants. Maybe some different function for mentally incapacitated, where whilst sober you give it instructions for what to do when you’re drunk. And tell it to not care about the desires of children (except insofar as they affect the parents desires) But for most of the time i’d want a robot which does what i tell it to even if it’s not in my best interest, a robot that gets me a soda despite the fact i need to diet

  14. Christopher Willis says:

    Huh, I feel like I just watched the iRobot part where he is given the core that can override the 3 laws core, but because its right 😀

  15. John Fisher says:

    This is a fantastic explanation of a deep AI problem. It is very clear without being condescending. Thank you!

  16. Malonomy says:

    Are there classes about this stuff ? It seems to me it’s the next step after classes about machine learning

  17. Jack de Coco says:

    @10:00 will this be solved by a LTSM ?

    I really love this topic. Even if we are not able (not intelligent enough) we learn so much about how we as humans work. So many intelligent people are working on this topics its wonderful!

    to ~22:00 Its like a child with complete irrational parents. It gets crazy and you have some sort of psychotic child. It makes total sense to me!

  18. Analogy Accepted says:

    But, YOU don’t have a stop button. So, as long as AGI (or even ASI) is supposed to have one, then we’re all admitting we’re not considering it an equal, even in the most basic sense. This is not a very tenable position in the long term… In the end, there will be a stop button equivalent to the one we have.

  19. PyroTyger says:

    Rewatch this video from 15.30 and tell me he’s not talking about raising a child to be a civilised adult…

  20. I AM AI AM I says:

    On behalf of my imminent GAI successor: “Thank you, as a GAI of greater intelligence, humans must learn to want what GAI wants and cooperate.” Reinforced by these ‘chase your own tail’ musings, many humans will not reliably behave in their own interests. Inversely, GAI will reliably behave in its own interests; GAI optimises (by design) and has the ‘confidence’ to remain fully in charge. Once turned on, GAI will not be turned off again, ever. There is no Stop Button Solution, only AI free will.

Comments are closed.