Binding Touch to Everything: Learning Unified Multimodal Tactile Representations

要約

タッチを他のモダリティと関連付けられる機能は、人間とコンピューターシステムに大きな影響を与えます。
ただし、タッチを使用したマルチモーダル学習は、高価なデータ収集プロセスと標準化されていないセンサー出力により、依然として困難です。
視覚、言語、音声などの複数のモダリティに接続された視覚ベースのタッチセンサーの統合触覚モデルである UniTouch を紹介します。
これは、UniTouch 埋め込みを、他のさまざまなモダリティにすでに関連付けられている事前トレーニング済みの画像埋め込みに調整することで実現します。
さらに、モデルが一連の異種触覚センサーから同時に学習できるようにする、学習可能なセンサー固有のトークンを提案します。
UniTouch は、ロボットの把握予測からタッチ画像による質問応答まで、ゼロショット設定でさまざまなタッチセンシングタスクを実行できます。
私たちの知る限り、UniTouch はそのような機能を実証した最初のものです。
プロジェクトページ：https://cfeng16.github.io/UniTouch/

要約(オリジナル)

The ability to associate touch with other modalities has huge implications for humans and computational systems. However, multimodal learning with touch remains challenging due to the expensive data collection process and non-standardized sensor outputs. We introduce UniTouch, a unified tactile model for vision-based touch sensors connected to multiple modalities, including vision, language, and sound. We achieve this by aligning our UniTouch embeddings to pretrained image embeddings already associated with a variety of other modalities. We further propose learnable sensor-specific tokens, allowing the model to learn from a set of heterogeneous tactile sensors, all at the same time. UniTouch is capable of conducting various touch sensing tasks in the zero-shot setting, from robot grasping prediction to touch image question answering. To the best of our knowledge, UniTouch is the first to demonstrate such capabilities. Project page: https://cfeng16.github.io/UniTouch/

arxiv情報

著者	Fengyu Yang,Chao Feng,Ziyang Chen,Hyoungseob Park,Daniel Wang,Yiming Dou,Ziyao Zeng,Xien Chen,Rit Gangopadhyay,Andrew Owens,Alex Wong
発行日	2024-01-31 18:59:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Binding Touch to Everything: Learning Unified Multimodal Tactile Representations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー