月別アーカイブ: 2025年5月

TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos

投稿日: 2025年5月27日作成者: jarxiv

要約ビデオは、カメラ、シーン、アクション、属性など、時間の経過とともに動的な関 … 続きを読む →

カテゴリー: cs.CV, cs.DB, cs.MM | コメントを受け付けていません

OB3D: A New Dataset for Benchmarking Omnidirectional 3D Reconstruction Using Blender

投稿日: 2025年5月27日作成者: jarxiv

要約神経放射輝度フィールド（NERF）と3Dガウスのスプラッティング（3DG） … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Agentic 3D Scene Generation with Spatially Contextualized VLMs

投稿日: 2025年5月27日作成者: jarxiv

要約ビジョン言語モデル（VLM）によって有効になったマルチモーダルコンテンツ生 … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities

投稿日: 2025年5月27日作成者: jarxiv

要約大規模な言語モデル（LLMS）の急速な進歩は、単一のフレームワーク内で視覚 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Improvement Strategies for Few-Shot Learning in OCT Image Classification of Rare Retinal Diseases

投稿日: 2025年5月27日作成者: jarxiv

要約このペーパーでは、少数のショット学習を使用して、OCT診断画像を主要かつ希 … 続きを読む →

カテゴリー: cs.AI, cs.CV, eess.IV | コメントを受け付けていません

Hard Negative Contrastive Learning for Fine-Grained Geometric Understanding in Large Multimodal Models

投稿日: 2025年5月27日作成者: jarxiv

要約大規模な自然なシーン画像で対比訓練された視覚エンコーダーの恩恵を受けて、大 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters

投稿日: 2025年5月27日作成者: jarxiv

要約近年、オーディオ駆動型の人間のアニメーションの大きな進歩が目撃されています … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

STAR-R1: Spatial TrAnsformation Reasoning by Reinforcing Multimodal LLMs

投稿日: 2025年5月27日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLM）は、多様なタスク全体で顕著な能力を … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Long-Context State-Space Video World Models

投稿日: 2025年5月27日作成者: jarxiv

要約ビデオ拡散モデルは最近、アクションを条件とする自己回帰フレーム予測を通じて … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

AW-GATCN: Adaptive Weighted Graph Attention Convolutional Network for Event Camera Data Joint Denoising and Object Recognition

投稿日: 2025年5月27日作成者: jarxiv

要約輝度が高い時間分解能で変化するイベントカメラは、本質的に重要なオブジェクト … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2025年5月

TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos

OB3D: A New Dataset for Benchmarking Omnidirectional 3D Reconstruction Using Blender

Agentic 3D Scene Generation with Spatially Contextualized VLMs

FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities

Improvement Strategies for Few-Shot Learning in OCT Image Classification of Rare Retinal Diseases

Hard Negative Contrastive Learning for Fine-Grained Geometric Understanding in Large Multimodal Models

HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters

STAR-R1: Spatial TrAnsformation Reasoning by Reinforcing Multimodal LLMs

Long-Context State-Space Video World Models

AW-GATCN: Adaptive Weighted Graph Attention Convolutional Network for Event Camera Data Joint Denoising and Object Recognition

最近の投稿

最近のコメント

アーカイブ

カテゴリー