月別アーカイブ: 2025年4月

VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models

投稿日: 2025年4月18日作成者: jarxiv

要約大規模な言語モデル（LLM）に基づいて構築された大規模なビデオモデル（LV … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training

投稿日: 2025年4月18日作成者: jarxiv

要約近年、ビジョン言語モデルのプリトレーニングの分野は、主に大規模な言語モデル … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Science-T2I: Addressing Scientific Illusions in Image Synthesis

投稿日: 2025年4月18日作成者: jarxiv

要約科学的知識を生成モデルに統合し、画像統合のリアリズムと一貫性を高めるための … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results

投稿日: 2025年4月18日作成者: jarxiv

要約このペーパーでは、短編UGCビデオ品質評価と強化に関するNTIRE 202 … 続きを読む →

カテゴリー: cs.AI, cs.CV, eess.IV | コメントを受け付けていません

PCBEAR: Pose Concept Bottleneck for Explainable Action Recognition

投稿日: 2025年4月18日作成者: jarxiv

要約人間の行動認識（HAR）は、深い学習モデルで印象的な結果を達成していますが … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

$\texttt{Complex-Edit}$: CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark

投稿日: 2025年4月18日作成者: jarxiv

要約さまざまな複雑さの指示にわたって命令ベースの画像編集モデルを体系的に評価す … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Readable Twins of Unreadable Models

投稿日: 2025年4月18日作成者: jarxiv

要約責任ある人工知能（AI）システムの作成は、AIの作品の現代の研究開発におけ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World

投稿日: 2025年4月18日作成者: jarxiv

要約動的な3D再構成とビデオのポイント追跡は、通常、深いつながりにもかかわらず … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs

投稿日: 2025年4月18日作成者: jarxiv

要約自然言語と3Dジオメトリを橋渡しすることは、柔軟で言語主導のシーンの理解に … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis

投稿日: 2025年4月18日作成者: jarxiv

要約地面と空中の景色の混合物から撮影された画像の幾何学的再構成のタスクを探りま … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2025年4月

VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models

Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training

Science-T2I: Addressing Scientific Illusions in Image Synthesis

NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results

PCBEAR: Pose Concept Bottleneck for Explainable Action Recognition

$\texttt{Complex-Edit}$: CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark

Readable Twins of Unreadable Models

St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World

Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs

AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis

最近の投稿

最近のコメント

アーカイブ

カテゴリー