Affogato: Learning Open-Vocabulary Affordance Grounding with Automated Data Generation at Scale

要約

相互作用の自然言語の説明に基づいたアフォーダンス接地局在オブジェクト領域 – インテリジェントなエージェントが環境を理解し、相互作用できるようにするための重要な課題です。
ただし、このタスクは、微調整された部分レベルのローカリゼーション、複数の有効な相互作用領域から生じるあいまいさ、および大規模なデータセットの不足のために困難なままです。
この作業では、150Kインスタンスを含む大規模なベンチマークである150Kインスタンスで構成される大規模なベンチマークを紹介します。これは、オブジェクトと相互作用の多様なセットにわたって、オープンボキャブラリーテキストの説明と対応する3Dアフォーダンスヒートマップを注釈します。
このベンチマークに基づいて、当社は、前提条件のパートアウェアビジョンバックボーンとテキストコンディショナルヒートマップデコーダーを活用するシンプルで効果的なビジョン言語モデルを開発しています。
Affogato Datasetでトレーニングされたモデルは、既存の2Dおよび3Dベンチマークで有望なパフォーマンスを実現し、特に音量のないクロスドメイン一般化において有効性を示します。
Affogato Datasetは、https：//huggingface.co/datasets/project-affogato/affogatoで共有されています

要約(オリジナル)

Affordance grounding-localizing object regions based on natural language descriptions of interactions-is a critical challenge for enabling intelligent agents to understand and interact with their environments. However, this task remains challenging due to the need for fine-grained part-level localization, the ambiguity arising from multiple valid interaction regions, and the scarcity of large-scale datasets. In this work, we introduce Affogato, a large-scale benchmark comprising 150K instances, annotated with open-vocabulary text descriptions and corresponding 3D affordance heatmaps across a diverse set of objects and interactions. Building on this benchmark, we develop simple yet effective vision-language models that leverage pretrained part-aware vision backbones and a text-conditional heatmap decoder. Our models trained with the Affogato dataset achieve promising performance on the existing 2D and 3D benchmarks, and notably, exhibit effectiveness in open-vocabulary cross-domain generalization. The Affogato dataset is shared in public: https://huggingface.co/datasets/project-affogato/affogato

arxiv情報

著者	Junha Lee,Eunha Park,Chunghyun Park,Dahyun Kang,Minsu Cho
発行日	2025-06-13 17:57:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Affogato: Learning Open-Vocabulary Affordance Grounding with Automated Data Generation at Scale

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー