Robotic-CLIP: Fine-tuning CLIP on Action Data for Robotic Applications

要約

視覚言語モデルは、さまざまなロボットアプリケーションにとって意味のある特徴を抽出する上で重要な役割を果たしてきました。
その中でも、Contrastive Language-Image Pretraining (CLIP) は、視覚と自然言語理解の両方を必要とするロボットタスクで広く使用されています。
ただし、CLIP はテキストプロンプトと組み合わせた静的画像のみでトレーニングされており、動的なアクションを伴うロボットタスクにはまだ完全には適応していません。
このペーパーでは、ロボットの認識機能を強化する Robotic-CLIP を紹介します。
まず大規模なアクションデータを収集してラベル付けし、次に対照学習を使用してアクションデータの 309,433 ビデオ (約 740 万フレーム) の CLIP を微調整することで、Robotic-CLIP を構築します。
Robotic-CLIP は、アクションデータを活用することで、CLIP の強力な画像パフォーマンスを継承しながら、ロボットのコンテキストでのアクションを理解する能力を獲得します。
集中的な実験により、当社の Robotic-CLIP は、さまざまな言語駆動のロボットタスクにわたって他の CLIP ベースのモデルよりも優れたパフォーマンスを発揮することが示されました。
さらに、現実世界の把握アプリケーションにおける Robotic-CLIP の実用的な有効性を実証します。

要約(オリジナル)

Vision language models have played a key role in extracting meaningful features for various robotic applications. Among these, Contrastive Language-Image Pretraining (CLIP) is widely used in robotic tasks that require both vision and natural language understanding. However, CLIP was trained solely on static images paired with text prompts and has not yet been fully adapted for robotic tasks involving dynamic actions. In this paper, we introduce Robotic-CLIP to enhance robotic perception capabilities. We first gather and label large-scale action data, and then build our Robotic-CLIP by fine-tuning CLIP on 309,433 videos (~7.4 million frames) of action data using contrastive learning. By leveraging action data, Robotic-CLIP inherits CLIP’s strong image performance while gaining the ability to understand actions in robotic contexts. Intensive experiments show that our Robotic-CLIP outperforms other CLIP-based models across various language-driven robotic tasks. Additionally, we demonstrate the practical effectiveness of Robotic-CLIP in real-world grasping applications.

arxiv情報

著者	Nghia Nguyen,Minh Nhat Vu,Tung D. Ta,Baoru Huang,Thieu Vo,Ngan Le,Anh Nguyen
発行日	2024-09-26 10:56:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Robotic-CLIP: Fine-tuning CLIP on Action Data for Robotic Applications

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー