「cs.CV」カテゴリーアーカイブ

Mamba2D: A Natively Multi-Dimensional State-Space Model for Vision Tasks

投稿日: 2024年12月23日作成者: jarxiv

要約状態空間モデル (SSM) は、長年のトランスフォーマーアーキテクチャに … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SeagrassFinder: Deep Learning for Eelgrass Detection and Coverage Estimation in the Wild

投稿日: 2024年12月23日作成者: jarxiv

要約海草草原は海洋生態系において重要な役割を果たしており、炭素隔離、水質改善、 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Frequency Is What You Need: Word-frequency Masking Benefits Vision-Language Model Pre-training

投稿日: 2024年12月23日作成者: jarxiv

要約トレーニングセットのサイズを縮小できれば、ビジョン言語モデル (VLM) … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MotiF: Making Text Count in Image Animation with Motion Focal Loss

投稿日: 2024年12月23日作成者: jarxiv

要約 Text-Image-to-Video (TI2V) 生成は、テキストの説 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Can Generative Video Models Help Pose Estimation?

投稿日: 2024年12月23日作成者: jarxiv

要約重なりがほとんどまたはまったくない画像からのペアごとの姿勢推定は、コンピュ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Personalized Representation from Personalized Generation

投稿日: 2024年12月23日作成者: jarxiv

要約最新のビジョンモデルは、汎用の下流タスクに優れています。ただし、粒度が … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding

投稿日: 2024年12月23日作成者: jarxiv

要約大規模言語モデル (LLM) の急速な進歩により、ビジョン言語モデル (V … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

GURecon: Learning Detailed 3D Geometric Uncertainties for Neural Surface Reconstruction

投稿日: 2024年12月23日作成者: jarxiv

要約神経表面表現は、新しいビューの合成と 3D 再構成の分野で目覚ましい成功を … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

A Deep Learning-Based Fully Automated Pipeline for Regurgitant Mitral Valve Anatomy Analysis From 3D Echocardiography

投稿日: 2024年12月23日作成者: jarxiv

要約三次元経食道心エコー検査（3DTEE）は、外科的修復または経カテーテル修復 … 続きを読む →

カテゴリー: cs.CV, q-bio.QM | コメントを受け付けていません

Temporally Consistent Object-Centric Learning by Contrasting Slots

投稿日: 2024年12月20日作成者: jarxiv

要約ビデオからの教師なしオブジェクト中心学習は、ラベルのない大規模なビデオのコ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Mamba2D: A Natively Multi-Dimensional State-Space Model for Vision Tasks

SeagrassFinder: Deep Learning for Eelgrass Detection and Coverage Estimation in the Wild

Frequency Is What You Need: Word-frequency Masking Benefits Vision-Language Model Pre-training

MotiF: Making Text Count in Image Animation with Motion Focal Loss

Can Generative Video Models Help Pose Estimation?

Personalized Representation from Personalized Generation

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding

GURecon: Learning Detailed 3D Geometric Uncertainties for Neural Surface Reconstruction

A Deep Learning-Based Fully Automated Pipeline for Regurgitant Mitral Valve Anatomy Analysis From 3D Echocardiography

Temporally Consistent Object-Centric Learning by Contrasting Slots

最近の投稿

最近のコメント

アーカイブ

カテゴリー