Aligning Machine and Human Visual Representations across Abstraction Levels

要約

ディープニューラルネットワークは、視覚タスクにおける人間の行動のモデルなど、幅広いアプリケーションで成功を収めています。
ただし、ニューラルネットワークのトレーニングと人間の学習は根本的な点で異なり、ニューラルネットワークは人間ほど堅牢に一般化できないことが多く、その基礎となる表現の類似性について疑問が生じます。
現代の学習システムがより人間らしい行動を示すためには何が欠けているのでしょうか?
私たちは、視覚モデルと人間の間の重要な不一致を強調します。人間の概念的知識は細かいスケールから大まかなスケールの区別まで階層的に組織されているのに対し、モデル表現はこれらすべての抽象化レベルを正確に捉えているわけではありません。
この不整合に対処するために、まず人間の判断を模倣するように教師モデルをトレーニングし、次に人間のような構造をその表現から事前トレーニングされた最先端の視覚基盤モデルに転送します。
これらの人間に合わせたモデルは、複数レベルの意味的抽象化にまたがる人間の判断の新しいデータセットを含む、幅広い類似タスクにわたって人間の行動と不確実性をより正確に近似します。
また、さまざまな機械学習タスクのパフォーマンスも向上し、一般化と分散外の堅牢性が向上します。
したがって、ニューラルネットワークに人間の知識を追加すると、人間の認知とより一貫性があり、より実用的に役立つ、両方の長所を備えた表現が得られ、より堅牢で解釈可能な、人間に似た人工知能システムへの道が開かれます。

要約(オリジナル)

Deep neural networks have achieved success across a wide range of applications, including as models of human behavior in vision tasks. However, neural network training and human learning differ in fundamental ways, and neural networks often fail to generalize as robustly as humans do, raising questions regarding the similarity of their underlying representations. What is missing for modern learning systems to exhibit more human-like behavior? We highlight a key misalignment between vision models and humans: whereas human conceptual knowledge is hierarchically organized from fine- to coarse-scale distinctions, model representations do not accurately capture all these levels of abstraction. To address this misalignment, we first train a teacher model to imitate human judgments, then transfer human-like structure from its representations into pretrained state-of-the-art vision foundation models. These human-aligned models more accurately approximate human behavior and uncertainty across a wide range of similarity tasks, including a new dataset of human judgments spanning multiple levels of semantic abstractions. They also perform better on a diverse set of machine learning tasks, increasing generalization and out-of-distribution robustness. Thus, infusing neural networks with additional human knowledge yields a best-of-both-worlds representation that is both more consistent with human cognition and more practically useful, thus paving the way toward more robust, interpretable, and human-like artificial intelligence systems.

arxiv情報

著者	Lukas Muttenthaler,Klaus Greff,Frieda Born,Bernhard Spitzer,Simon Kornblith,Michael C. Mozer,Klaus-Robert Müller,Thomas Unterthiner,Andrew K. Lampinen
発行日	2024-09-16 13:22:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Aligning Machine and Human Visual Representations across Abstraction Levels

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー