「cs.CV」カテゴリーアーカイブ

LiveXiv — A Multi-Modal Live Benchmark Based on Arxiv Papers Content

投稿日: 2024年10月16日作成者: jarxiv

要約 Web から収集したデータに関するマルチモーダルモデルの大規模トレーニン … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling

投稿日: 2024年10月16日作成者: jarxiv

要約マルチモーダル大規模言語モデルの最近の進歩により、画像の理解と生成の両方が … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Make the Pertinent Salient: Task-Relevant Reconstruction for Visual Control with Distractions

投稿日: 2024年10月15日作成者: jarxiv

要約モデルベース強化学習 (MBRL) の最近の進歩により、MBRL は視覚的 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

REPeat: A Real2Sim2Real Approach for Pre-acquisition of Soft Food Items in Robot-assisted Feeding

投稿日: 2024年10月15日作成者: jarxiv

要約この論文では、ロボット支援による柔らかい食品の給餌における咬合獲得を強化す … 続きを読む →

カテゴリー: cs.CV, cs.GR, cs.RO | コメントを受け付けていません

The Ingredients for Robotic Diffusion Transformers

投稿日: 2024年10月15日作成者: jarxiv

要約近年、ロボット工学者は、大容量の Transformer ネットワークア … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

Innovative Deep Learning Techniques for Obstacle Recognition: A Comparative Study of Modern Detection Algorithms

投稿日: 2024年10月15日作成者: jarxiv

要約この研究では、高度な YOLO モデル、特に YOLOv8、YOLOv7、 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language Navigation

投稿日: 2024年10月15日作成者: jarxiv

要約視覚と言語のナビゲーション (VLN) により、エージェントは自然言語の指 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Fusion-Driven Tree Reconstruction and Fruit Localization: Advancing Precision in Agriculture

投稿日: 2024年10月15日作成者: jarxiv

要約果物の流通は、農業と農業ロボットの将来を形作る上で極めて重要であり、合理化 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Twisting Lids Off with Two Hands

投稿日: 2024年10月15日作成者: jarxiv

要約 2 本の多指ハンドで物体を操作することは、多くの操作タスクの接触が多い性質 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation

投稿日: 2024年10月15日作成者: jarxiv

要約言語ガイドによるロボット操作は、さまざまな複雑な操作タスクを達成するために … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

LiveXiv — A Multi-Modal Live Benchmark Based on Arxiv Papers Content

MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling

Make the Pertinent Salient: Task-Relevant Reconstruction for Visual Control with Distractions

REPeat: A Real2Sim2Real Approach for Pre-acquisition of Soft Food Items in Robot-assisted Feeding

The Ingredients for Robotic Diffusion Transformers

Innovative Deep Learning Techniques for Obstacle Recognition: A Comparative Study of Modern Detection Algorithms

Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language Navigation

Fusion-Driven Tree Reconstruction and Fruit Localization: Advancing Precision in Agriculture

Twisting Lids Off with Two Hands

PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation

最近の投稿

最近のコメント

アーカイブ

カテゴリー