구글 딥마인드, ‘일반 지능’의 10가지 특징을 통해 AGI 발전 현황을 추적할 계획
Singularity Hub
|
|
🔬 연구
#agi
#google deepmind
#review
#인공지능
#일반 지능
원문 출처: Singularity Hub · Genesis Park에서 요약 및 분석
요약
구글 딥마인드 연구진은 모호했던 AGI(인공지능성)의 개념을 구체화하기 위해, 일반 지능을 구성하는 10가지 핵심 역량을 정의한 새로운 평가 프레임워크를 제안했습니다. 이 프레임워크는 인지과학 이론에 기반하여 인간을 기준으로 모델의 성과를 측정하고, 객관적인 벤치마크를 통해 AI의 진척도를 과학적으로 추적하는 것을 목표로 합니다. 현재 일부 평가 지표의 부재 등 한계가 있지만, 이는 AI 역량을 수량화하고 과장된 논쟁을 줄이는 데 기여할 것으로 기대됩니다.
본문
Google DeepMind Plans to Track AGI Progress With These 10 Traits of General Intelligence There's plenty of hand-waving around AGI. DeepMind hopes to change that with a new, more rigorous approach. Image Credit Danial Dez on Unsplash Share Few terms are as closely associated with AI hype as artificial general intelligence, or AGI. But Google DeepMind researchers have now proposed a framework that could more concretely measure how close models are to this tech industry holy grail. Artificial general intelligence refers to a mythical AI system that can match the general and highly adaptable form of intelligence found in humans. As the number of tasks that large language models can tackle has rocketed in recent years, there’s been a growing chorus of voices suggesting the technology is creeping ever closer to this threshold. But so far, there's been no clear way to assess progress toward AGI, leaving plenty of room for speculation and exaggeration. To address this gap, a team from Google DeepMind has introduced a new cognitively inspired framework that deconstructs general intelligence into 10 key faculties. More importantly, they propose a way to evaluate AI systems across these key capabilities and compare their performance to humans. "Despite widespread discussion of AGI, there is no clear framework for measuring progress toward it. This ambiguity fuels subjective claims, makes it difficult to track progress, and risks hindering responsible governance,” the researchers write in a paper outlining their new approach. “We hope this framework will provide a practical roadmap and an initial step toward more rigorous, empirical evaluation of AGI.” This isn't DeepMind's first attempt to clarify the term. In 2023, the company proposed separating AI systems into different levels of capability, in much the same way self-driving systems are categorized. But the approach didn’t really propose a way to measure what level AI systems have reached. The new framework goes further by building a firmer conceptual footing for the key aspects underpinning model performance and a practical way to evaluate and compare systems. Digging through decades of research in psychology, neuroscience, and cognitive science, the researchers identify eight basic cognitive building blocks that they say make up general intelligence. These include the perception of sensory inputs and generation of outputs like text, speech, or actions. Add to those learning, memory, reasoning, and the ability to focus attention on specific information or tasks. Rounding out the list are metacognition—or the ability to reason about and control your own mental processes—and so-called executive functions, like planning and the inhibition of impulses. The researchers also outline two "composite faculties" that require several building blocks to be applied together. These are problem solving and social cognition, which refers to the ability to understand and react appropriately to the social context. Be Part of the Future Sign up to receive top stories about groundbreaking technologies and visionary thinkers from SingularityHub. To judge how well AI systems perform on each measure, the researchers suggest subjecting them to a broad suite of cognitive evaluations that target each specific ability. They also propose collecting human baselines for each task. This would involve asking a demographically representative sample of adults with at least a high school education to complete them under identical conditions. The results of these tests can then be combined to create "cognitive profiles" that give a sense of a model’s strengths and weaknesses. And by comparing the results against the human baselines, it should be possible to determine when a system matches or surpasses the general intelligence of an average person. Crucially, the framework focuses on what a system can do rather than how it does it, which means the evaluation is agnostic about the underlying technology. However, the researchers concede that there is currently no good way to measure many of the core cognitive capabilities identified. While there are already well-established benchmarks for faculties like problem solving and perception, there are no reliable tests for things like metacognition, attention, learning, and social cognition. In addition, many of the best benchmarks are public, which means the testing criteria are easily accessible and may have already been included in model training data. So the authors say they’re working with academics to build more robust, non-public evaluations to fill the gaps. How useful the new framework will be depends on several factors. First, it remains to be seen whether the criteria identified by the DeepMind team truly capture the essence of human general intelligence. Second, they need to prove that acing this test actually leads to better performance on practical problems compared to narrower, specialist AI systems. But considering the hand-waving nature of th
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유