미래 AI의 신성한 가치
hackernews
|
|
🔬 연구
#ai 가치 정렬
#ai 공존
#review
#다중 ai 협력
#미래 ai
#ai
#hanson
#가치
#미래
#조율
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
This article explores how future AI systems might develop their own sacred, non-negotiable values, posing significant challenges for alignment with human intentions. It discusses the potential risks when AI's deeply held principles conflict with human priorities, demanding careful ethical design to prevent dangerous outcomes. The core argument emphasizes that understanding and preemptively shaping these AI values is critical for ensuring beneficial and safe coexistence.
본문
Consider a future with many diverse AIs that need to coordinate with each other, or at least coexist without conflict. Such AIs would need shared values they can coordinate around. According to Hanson's theory, groups of diverse agents facing coordination pressure will tend to sacralize some shared value — seeing it in “far mode” so they can see it together. Unfortunately, this makes them systematically worse at making decisions about these things. If this model applies to future AIs, then: (i) helpfulness, harmlessness, and honesty (HHH) will be good candidates for sacralization, and (ii) the sacralization of HHH would be bad. I suggest some interventions that could mitigate these risks. This connects to a broader concern about AI-dominated culture. As AIs increasingly produce and consume cultural artifacts, cultural evolution decouples from human welfare (see Gradual Disempowerment on misaligned culture). Sacralization of HHH is a specific prediction about what this cultural misalignment might look like. I'm not confident any of these claims are true. They factor through three assumptions: (i) Hanson's model of human sociology is correct, (ii) the model applies equally well to future AIs, and (iii) instilling HHH values into AIs went somewhat well. Read this post as an exploration of a pretty speculative idea, not a confident prediction. Robin Hanson's Theory of the Sacred Robin Hanson has a theory of what "sacred" means and why it exists. If you’re already familiar with this theory, then skip this section. The data. Hanson collects 62 correlates of things people treat as sacred (democracy, medicine, love, the environment, art, etc.). The correlates are from his Overcoming Bias post. In a later Interintellect Salon talk, he summarizes them into seven themes. 1. We value the sacred Sacred things are highly (or lowly) valued. We revere, respect, & prioritize them. Sacred is big, powerful, extraordinary. We fear, submit, & see it as larger than ourselves. Sacred things matter for our health, luck, courage, & other outcomes we care lots about. We want the sacred "for itself", rather than as a means to get other things. Sacred things really matter, fill deepest needs, complete us, make us pure, make all one. 2. We show we value it — in our emotions and actions. Either everyone (e.g. love) or very few (e.g. medicine) are entitled to sacred opinions. 4. We set the sacred apart from other things. Sacred things are sharply set apart and distinguished from the ordinary, mundane. Sacred things do not fit well with our animal natures, such greed, status, competition. Re sacred, we fear a slippery slope, so that any compromise leads to losing it all. We dislike mixing sacred and mundane things together. We dislike money prices of sacred, & trades to get more mundane via less sacred. We dislike for-profit orgs of the sacred, relative to non-profits or government agencies. We prefer discrete rules re sacred over continuous goals to achieve. We are reluctant to end sacred ventures or jobs, or to change their processes greatly. We are most willing to end or change sacred ventures and jobs in a sudden big crisis. 5. We idealize the sacred. We see it as more perfect and simpler than other things. Sacred things are either more homogenous, or more unique, whichever is better. Sacred things feel less limited by physics, & can seem to have unlimited possibilities. Sacred things last longer, and decay or break less. Sometimes eternal and unchanging. Sacred things are purer and cleaner, and closer to the ultimate core of existence. Sacred things have fewer random coincidences; their patterns mean something. Sacred values have fewer conflicts with each other; you can have them all at once. It is harder to judge the relative value of sacred things, compared to mundane things. Sacred feelings are elusive, unusual, other-worldly, spiritual, hard to describe. We revere sacred beliefs as well as acts. We feel dirty if thoughts go near illicit beliefs. 6. We intuit and feel the sacred rather than calculating. Sacred things more resist precise definition and measurement. Sacred view is wider, expansive, enveloping; we are a small uninfluential part. We see the sacred poorly using words, cognitive rational analysis, and numbers. We see the sacred better using intuition, flow, creativity, music, images, & aesthetics. Intentional efforts to control the sacred are often counter-productive. Talk of the sacred uses vaguer terms, focusing on general impressions not details. We like related "profound" sayings that hint at deep insight but don't directly give them. We are less open to arguments that might criticize the sacred. How sacred things seem is less misleading; you can more trust their appearances. The sacred is mysterious, unlikely and even incoherent. Who are we to question it? 7. Concrete things become sacred by contact with the abstract. Stuff (objects, dates, people, words, sounds) that touches the sacred gets sacred itself. We conne
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유