月別アーカイブ: 2025年1月

Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark

投稿日: 2025年1月10日作成者: jarxiv

要約テキストと画像の両方を有機的に推論する能力は人間の知能の柱ですが、そのよう … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Consistent Flow Distillation for Text-to-3D Generation

投稿日: 2025年1月10日作成者: jarxiv

要約スコア蒸留サンプリング (SDS) は、3D 生成用の画像生成モデルの蒸留 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Relative Pose Estimation through Affine Corrections of Monocular Depth Priors

投稿日: 2025年1月10日作成者: jarxiv

要約単眼深度推定 (MDE) モデルは、近年大幅な進歩を遂げてきました。多く … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Explainable AI-Enhanced Deep Learning for Pumpkin Leaf Disease Detection: A Comparative Analysis of CNN Architectures

投稿日: 2025年1月10日作成者: jarxiv

要約カボチャの葉の病気は農業生産性に対する重大な脅威であり、効果的な管理のため … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Decentralized Diffusion Models

投稿日: 2025年1月10日作成者: jarxiv

要約大規模な AI モデルのトレーニングでは、数千の GPU に作業を分割し、 … 続きを読む →

カテゴリー: cs.CV, cs.DC, cs.LG | コメントを受け付けていません

An Empirical Study of Autoregressive Pre-training from Videos

投稿日: 2025年1月10日作成者: jarxiv

要約私たちはビデオからの自己回帰事前トレーニングを実証的に研究しています。研 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

投稿日: 2025年1月10日作成者: jarxiv

要約表やチャートの解釈など、構造化された画像を理解するには、画像内のさまざまな … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis

投稿日: 2025年1月10日作成者: jarxiv

要約オムニモーダル学習の最近の進歩は、主に独自のモデル内ではあるものの、画像、 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Identity-Preserving Video Dubbing Using Motion Warping

投稿日: 2025年1月10日作成者: jarxiv

要約ビデオダビングは、リファレンスビデオと運転音声信号からリアルなリップシンク … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Human Delegation Behavior in Human-AI Collaboration: The Effect of Contextual Information

投稿日: 2025年1月10日作成者: jarxiv

要約人工知能 (AI) を職場での人間の意思決定プロセスに統合すると、機会と課 … 続きを読む →

カテゴリー: cs.HC, cs.LG | コメントを受け付けていません

月別アーカイブ: 2025年1月

Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark

Consistent Flow Distillation for Text-to-3D Generation

Relative Pose Estimation through Affine Corrections of Monocular Depth Priors

Explainable AI-Enhanced Deep Learning for Pumpkin Leaf Disease Detection: A Comparative Analysis of CNN Architectures

Decentralized Diffusion Models

An Empirical Study of Autoregressive Pre-training from Videos

ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis

Identity-Preserving Video Dubbing Using Motion Warping

Human Delegation Behavior in Human-AI Collaboration: The Effect of Contextual Information

最近の投稿

最近のコメント

アーカイブ

カテゴリー