「cs.AI」カテゴリーアーカイブ

Reinforcing Multimodal Understanding and Generation with Dual Self-rewards

投稿日: 2025年6月10日作成者: jarxiv

要約大規模な言語モデル（LLMS）に基づいて、最近の大規模なマルチモーダルモデ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design

投稿日: 2025年6月10日作成者: jarxiv

要約手動スライドの作成は労働集約的であり、専門家の事前知識が必要です。既存の … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Audio-Sync Video Generation with Multi-Stream Temporal Control

投稿日: 2025年6月10日作成者: jarxiv

要約オーディオは本質的に一時的であり、視覚的な世界と密接に同期されているため、 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Dynamic View Synthesis as an Inverse Problem

投稿日: 2025年6月10日作成者: jarxiv

要約この作業では、トレーニングなしの設定での逆の問題として、単眼動画からの動的 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Hidden in plain sight: VLMs overlook their visual representations

投稿日: 2025年6月10日作成者: jarxiv

要約言語は、視覚タスクのパフォーマンスを指定および評価するための自然なインター … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

投稿日: 2025年6月10日作成者: jarxiv

要約自己回帰ビデオ拡散モデルの新しいトレーニングパラダイムである自己強制を紹介 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior

投稿日: 2025年6月10日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLMS）は、グラフィカルユーザーインター … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Vision Transformers Don’t Need Trained Registers

投稿日: 2025年6月10日作成者: jarxiv

要約視覚変圧器における以前に特定された現象の根底にあるメカニズムを調査します。 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets

投稿日: 2025年6月10日作成者: jarxiv

要約密な予測のためのマルチタスク学習は、すべてのタスクの広範な注釈の必要性によ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Distillation Robustifies Unlearning

投稿日: 2025年6月10日作成者: jarxiv

要約現在のLLM学習方法は堅牢ではありません。それらは、微調整のいくつかのステ … 続きを読む →

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

「cs.AI」カテゴリーアーカイブ

Reinforcing Multimodal Understanding and Generation with Dual Self-rewards

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design

Audio-Sync Video Generation with Multi-Stream Temporal Control

Dynamic View Synthesis as an Inverse Problem

Hidden in plain sight: VLMs overlook their visual representations

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior

Vision Transformers Don’t Need Trained Registers

StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets

Distillation Robustifies Unlearning

最近の投稿

最近のコメント

アーカイブ

カテゴリー