IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition

要約

ロボット支援給餌は、摂食障害のある個人の生活の質を改善するための大きな約束を抱いています。
ただし、さまざまな条件下で多様な食品を獲得し、目に見えない食品に一般化することは、ユニークな課題を提示します。
視覚的な手がかり（例えば、色、形状、テクスチャなど）から導き出された表面レベルの幾何情報（例：境界ボックスやポーズ）に依存する既存の方法には、特に同様の身体的特性を共有するが、視覚的外観が異なる場合、適応性と堅牢性が欠けます。
私たちは模倣学習（IL）を採用して、食品獲得のポリシーを学びます。
既存の方法は、ILまたはRehnection Learning（RL）を使用して、ResNet-50などの既製の画像エンコーダーに基づいてポリシーを学習します。
ただし、そのような表現は堅牢ではなく、多様な買収シナリオ全体に一般化するのに苦労しています。
これらの制限に対処するために、視覚的、物理的、時間的、および幾何学的表現を統合して食品獲得のためのILの堅牢性と一般化可能性を高めるための新しいアプローチIMRL（統合された多次元表現学習）を提案します。
私たちのアプローチは、食物の種類と物理的特性（例えば、固体、半固体、粒状、液体、混合物など）をキャプチャし、取得アクションの一時的なダイナミクスをモデル化し、幾何学的情報を導入して最適なスクープポイントを決定し、ボウルの膨らみを評価します。
IMRLにより、ILはコンテキストに基づいてスクープ戦略を適応的に調整し、多様な食品獲得シナリオを処理するロボットの能力を向上させることができます。
実際のロボットでの実験は、目に見えない設定へのゼロショット一般化を含む、さまざまな食品やボウル構成にわたるアプローチの堅牢性と適応性を示しています。
私たちのアプローチは、最高のパフォーマンスのベースラインと比較して、最大35ドルの成功率の改善を達成します。
詳細については、当社のWebサイトhttps://ruiiu.github.io/imrlをご覧ください。

要約(オリジナル)

Robotic assistive feeding holds significant promise for improving the quality of life for individuals with eating disabilities. However, acquiring diverse food items under varying conditions and generalizing to unseen food presents unique challenges. Existing methods that rely on surface-level geometric information (e.g., bounding box and pose) derived from visual cues (e.g., color, shape, and texture) often lacks adaptability and robustness, especially when foods share similar physical properties but differ in visual appearance. We employ imitation learning (IL) to learn a policy for food acquisition. Existing methods employ IL or Reinforcement Learning (RL) to learn a policy based on off-the-shelf image encoders such as ResNet-50. However, such representations are not robust and struggle to generalize across diverse acquisition scenarios. To address these limitations, we propose a novel approach, IMRL (Integrated Multi-Dimensional Representation Learning), which integrates visual, physical, temporal, and geometric representations to enhance the robustness and generalizability of IL for food acquisition. Our approach captures food types and physical properties (e.g., solid, semi-solid, granular, liquid, and mixture), models temporal dynamics of acquisition actions, and introduces geometric information to determine optimal scooping points and assess bowl fullness. IMRL enables IL to adaptively adjust scooping strategies based on context, improving the robot’s capability to handle diverse food acquisition scenarios. Experiments on a real robot demonstrate our approach’s robustness and adaptability across various foods and bowl configurations, including zero-shot generalization to unseen settings. Our approach achieves improvement up to $35\%$ in success rate compared with the best-performing baseline. More details can be found on our website https://ruiiu.github.io/imrl.

arxiv情報

著者	Rui Liu,Zahiruddin Mahammad,Amisha Bhaskar,Pratap Tokekar
発行日	2025-03-18 15:32:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー