「cs.CV」カテゴリーアーカイブ

Surface Vision Mamba: Leveraging Bidirectional State Space Model for Efficient Spherical Manifold Representation

投稿日: 2025年2月14日作成者: jarxiv

要約注意ベースの方法は、従来の幾何学的深部学習（GDL）モデルを上回り、球状の … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Moment of Untruth: Dealing with Negative Queries in Video Moment Retrieval

投稿日: 2025年2月14日作成者: jarxiv

要約ビデオモーメント検索は、視覚言語モデルのパフォーマンスを評価するための一般 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

When do they StOP?: A First Step Towards Automatically Identifying Team Communication in the Operating Room

投稿日: 2025年2月14日作成者: jarxiv

要約目的：外科的パフォーマンスは、外科医の技術的スキルだけでなく、手術中に存在 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

PulseCheck457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models

投稿日: 2025年2月14日作成者: jarxiv

要約大規模なマルチモーダルモデル（LMM）は、視覚的なシーンの解釈と推論におい … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Robot Instance Segmentation with Few Annotations for Grasping

投稿日: 2025年2月13日作成者: jarxiv

要約ロボットがオブジェクトを操作する能力は、視覚的認識に適したことに大きく依存 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.RO | コメントを受け付けていません

NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar

投稿日: 2025年2月13日作成者: jarxiv

要約最近、視覚的な接地とマルチセンサーの設定が、陸生自治駆動システムと無人の表 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

EmbodiedSAM: Online Segment Any 3D Thing in Real Time

投稿日: 2025年2月13日作成者: jarxiv

要約具体化されたタスクでは、エージェントがその探索と同時に3Dシーンを完全に理 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Observe Then Act: Asynchronous Active Vision-Action Model for Robotic Manipulation

投稿日: 2025年2月13日作成者: jarxiv

要約実際のシナリオでは、多くのロボット操作タスクが閉塞と限られた視野によって妨 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

DriveGPT: Scaling Autoregressive Behavior Models for Driving

投稿日: 2025年2月13日作成者: jarxiv

要約自律運転のスケーラブルな動作モデルであるDriveGPTを提示します。運 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

Robotic Grasping of Harvested Tomato Trusses Using Vision and Online Learning

投稿日: 2025年2月13日作成者: jarxiv

要約現在、Truss Tomatoの計量とパッケージには、かなりの手動作業が必 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.RO | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Surface Vision Mamba: Leveraging Bidirectional State Space Model for Efficient Spherical Manifold Representation

Moment of Untruth: Dealing with Negative Queries in Video Moment Retrieval

When do they StOP?: A First Step Towards Automatically Identifying Team Communication in the Operating Room

PulseCheck457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models

Robot Instance Segmentation with Few Annotations for Grasping

NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar

EmbodiedSAM: Online Segment Any 3D Thing in Real Time

Observe Then Act: Asynchronous Active Vision-Action Model for Robotic Manipulation

DriveGPT: Scaling Autoregressive Behavior Models for Driving

Robotic Grasping of Harvested Tomato Trusses Using Vision and Online Learning

最近の投稿

最近のコメント

アーカイブ

カテゴリー