「cs.CV」カテゴリーアーカイブ

A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model

投稿日: 2024年11月8日作成者: jarxiv

要約このビデオの時代において、自動ビデオ編集技術は、作業負荷を軽減し、人間の編 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

GD doesn’t make the cut: Three ways that non-differentiability affects neural network training

投稿日: 2024年11月8日作成者: jarxiv

要約この論文では、非微分可能関数 (NGDM) に適用される勾配法と、微分可能 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

投稿日: 2024年11月8日作成者: jarxiv

要約ドキュメントからの質問に答えるドキュメントビジュアル質問応答 (DocV … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM

投稿日: 2024年11月8日作成者: jarxiv

要約このペーパーは、テキスト記述、画像、点群、またはそれらの組み合わせの形式で … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Uncovering Hidden Subspaces in Video Diffusion Models Using Re-Identification

投稿日: 2024年11月8日作成者: jarxiv

要約潜在ビデオ拡散モデルは、生成された画質と時間的一貫性のおかげで、一般の観察 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

VAIR: Visuo-Acoustic Implicit Representations for Low-Cost, Multi-Modal Transparent Surface Reconstruction in Indoor Scenes

投稿日: 2024年11月8日作成者: jarxiv

要約屋内で動作する移動ロボットは、透明な表面を含む困難なシーンをナビゲートでき … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

A Comparative Analysis of U-Net-based models for Segmentation of Cardiac MRI

投稿日: 2024年11月8日作成者: jarxiv

要約医療画像とは、医学的疾患の診断、監視、さらには治療を目的として、人体とその … 続きを読む →

カテゴリー: cs.CV, cs.LG, eess.IV | コメントを受け付けていません

AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation

投稿日: 2024年11月8日作成者: jarxiv

要約ニューラルネットワークアーキテクチャの設計では、多くの重要な決定を行う … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Planar Reflection-Aware Neural Radiance Fields

投稿日: 2024年11月8日作成者: jarxiv

要約 Neural Radiance Fields (NeRF) は、複雑なシー … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation

投稿日: 2024年11月8日作成者: jarxiv

要約画像からビデオへの生成方法は、印象的で写真のようにリアルな品質を実現しまし … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model

GD doesn’t make the cut: Three ways that non-differentiability affects neural network training

M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM

Uncovering Hidden Subspaces in Video Diffusion Models Using Re-Identification

VAIR: Visuo-Acoustic Implicit Representations for Low-Cost, Multi-Modal Transparent Surface Reconstruction in Indoor Scenes

A Comparative Analysis of U-Net-based models for Segmentation of Cardiac MRI

AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation

Planar Reflection-Aware Neural Radiance Fields

SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation

最近の投稿

最近のコメント

アーカイブ

カテゴリー