「cs.CV」カテゴリーアーカイブ

Stable Flow: Vital Layers for Training-Free Image Editing

投稿日: 2024年11月22日作成者: jarxiv

要約拡散モデルは、コンテンツの合成と編集の分野に革命をもたらしました。最近の … 続きを読む →

カテゴリー: cs.CV, cs.GR, cs.LG | コメントを受け付けていません

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

投稿日: 2024年11月22日作成者: jarxiv

要約大規模言語モデル (LLM) は、より推論することで強化された機能と信頼性 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

ViSTa Dataset: Do vision-language models understand sequential tasks?

投稿日: 2024年11月22日作成者: jarxiv

要約強化学習の報酬モデルとしてビジョン言語モデル (VLM) を使用すると、コ … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Pushing the Limits of Sparsity: A Bag of Tricks for Extreme Pruning

投稿日: 2024年11月22日作成者: jarxiv

要約ディープニューラルネットワークのプルーニングは、高密度ネットワークのパ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Geometric Algebra Planes: Convex Implicit Neural Volumes

投稿日: 2024年11月22日作成者: jarxiv

要約ボリュームのパラメータ化は、古典的なボクセルグリッドから暗黙的なニューラ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting

投稿日: 2024年11月21日作成者: jarxiv

要約 3D ガウススプラッティング (3DGS) を使用したロボットマニピュ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

VADet: Multi-frame LiDAR 3D Object Detection using Variable Aggregation

投稿日: 2024年11月21日作成者: jarxiv

要約入力集約は、最先端の LiDAR 3D 物体検出器が検出を向上させるために … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

投稿日: 2024年11月21日作成者: jarxiv

要約画像やビデオの制御可能な生成モデルは目覚ましい成功を収めていますが、3D … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Intensity-Spatial Dual Masked Autoencoder for Multi-Scale Feature Learning in Chest CT Segmentation

投稿日: 2024年11月21日作成者: jarxiv

要約医療画像セグメンテーションの分野では、不明瞭な病変の特徴、曖昧な境界、マル … 続きを読む →

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

An Integrated Approach to Robotic Object Grasping and Manipulation

投稿日: 2024年11月21日作成者: jarxiv

要約倉庫業務における手作業と効率化という増大する課題に対応して、Amazon … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Stable Flow: Vital Layers for Training-Free Image Editing

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

ViSTa Dataset: Do vision-language models understand sequential tasks?

Pushing the Limits of Sparsity: A Bag of Tricks for Extreme Pruning

Geometric Algebra Planes: Convex Implicit Neural Volumes

Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting

VADet: Multi-frame LiDAR 3D Object Detection using Variable Aggregation

MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

Intensity-Spatial Dual Masked Autoencoder for Multi-Scale Feature Learning in Chest CT Segmentation

An Integrated Approach to Robotic Object Grasping and Manipulation

最近の投稿

最近のコメント

アーカイブ

カテゴリー