AIES '22: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society
It is increasingly recognised that advances in artificial intelligence could have large and long-lasting impacts on society. However, what form those impacts will take, just how large and long-lasting they will be, and whether they will ultimately be positive or negative for humanity, is far from clear. Based on surveying literature on the societal impacts of AI, we identify and discuss five potential long-term impacts of AI: how AI could lead to long-term chances in science, cooperation, power, epistemics, and values. We review the state of existing research in each of these areas and highlight priority questions for future research.
[..]
"The system could learn the objective “maximise the contents of the memory cell where the score is stored” which, over the long run, will lead it to fool the humans scoring its behaviour into thinking that it is doing what they intended, and eventually seize control over that memory cell, and eliminate actors who might try to interfere with this. When the intended task requires performing complex actions in the real world, this alternative strategy would probably allow the system to get much higher scores, much more easily, than successfully performing the task as intended. • Suppose that some system is being trained to further some company’s objective. This system could learn the objective “maximise quarterly revenue” which, over the long run, would lead it to (e.g.) collude with auditors valuing the company's output, fool the company’s directors, and eventually ensure no actor who might reduce the company's revenue can interfere. It’s also worth noting that, to the extent that these incorrect objectives are easier to represent, learn, or make plans towards than the intended objective—which is likely, because we will be trying to use AI to achieve difficult tasks—then they may be the kind of objectives that AI systems learn by default.15 This kind of behaviour is currently not a big issue, because AI systems do not have very much decision-making power over the world. When failures occur, they look like amusing anecdotes rather than world-ending disasters [43]. But as AI systems become more advanced and begin to take over more important decision-making in the world, an AI system pursuing a different objective from what was intended could have much more worrying consequences. What might these consequences look like in practice? In one scenario, described by Christiano [11], we gradually use AI to automate more and more decision-making across different sectors (e.g., law enforcement, business strategy, legislation), because AI systems become able to make better and faster decisions than humans in those sectors"