「cs.MM」カテゴリーアーカイブ

Video-Guided Foley Sound Generation with Multimodal Controls

投稿日: 2025年3月18日作成者: jarxiv

要約ビデオのサウンドエフェクトを生成するには、多くの場合、実生活のソースとサウ … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing

投稿日: 2025年3月18日作成者: jarxiv

要約要素レベルの視覚操作はデジタルコンテンツの作成に不可欠ですが、現在の拡散ベ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM | コメントを受け付けていません

Rethinking Few-Shot Adaptation of Vision-Language Models in Two Stages

投稿日: 2025年3月17日作成者: jarxiv

要約 An old-school recipe for training a c … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.MM | コメントを受け付けていません

TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing

投稿日: 2025年3月17日作成者: jarxiv

要約 Treemeshgptを紹介します。Treemeshgptは、入力ポイント … 続きを読む →

カテゴリー: cs.CV, cs.GR, cs.MM | コメントを受け付けていません

AudioX: Diffusion Transformer for Anything-to-Audio Generation

投稿日: 2025年3月14日作成者: jarxiv

要約オーディオと音楽の生成は、多くのアプリケーションで重要なタスクとして浮上し … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model

投稿日: 2025年3月13日作成者: jarxiv

要約マルチモーダルの基礎モデルをトレーニングするためのオーディオとビジュアルデ … 続きを読む →

カテゴリー: 68T, 68T10, 68T45, cs.CL, cs.IR, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

GenHPE: Generative Counterfactuals for 3D Human Pose Estimation with Radio Frequency Signals

投稿日: 2025年3月13日作成者: jarxiv

要約人間のポーズ推定（HPE）は、さまざまな用途の人体関節の位置を検出します。 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM, eess.SP | コメントを受け付けていません

YuE: Scaling Open Foundation Models for Long-Form Music Generation

投稿日: 2025年3月12日作成者: jarxiv

要約 LLAMA2アーキテクチャに基づいたオープンファンデーションモデルのファミ … 続きを読む →

カテゴリー: cs.AI, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Video-to-Audio Generation with Hidden Alignment

投稿日: 2025年3月12日作成者: jarxiv

要約ビデオ入力に従って意味的および一時的に整列したオーディオコンテンツを生成す … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding

投稿日: 2025年3月12日作成者: jarxiv

要約ビデオ大規模な言語モデル（Videollms）は、ビデオ理解において顕著な … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.MM | コメントを受け付けていません

「cs.MM」カテゴリーアーカイブ

Video-Guided Foley Sound Generation with Multimodal Controls

BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing

Rethinking Few-Shot Adaptation of Vision-Language Models in Two Stages

TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing

AudioX: Diffusion Transformer for Anything-to-Audio Generation

Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model

GenHPE: Generative Counterfactuals for 3D Human Pose Estimation with Radio Frequency Signals

YuE: Scaling Open Foundation Models for Long-Form Music Generation

Video-to-Audio Generation with Hidden Alignment

ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding

最近の投稿

最近のコメント

アーカイブ

カテゴリー