「cs.MM」カテゴリーアーカイブ

Interpretable Concept-based Deep Learning Framework for Multimodal Human Behavior Modeling

投稿日: 2025年2月17日作成者: jarxiv

要約インテリジェントな接続性の現代の時代において、システムが人間の行動状態を認 … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions

投稿日: 2025年2月13日作成者: jarxiv

要約限られた語彙を持つ非ネイティブスピーカーは、それらを視覚化することができた … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.IR, cs.MM | コメントを受け付けていません

Human-Centric Foundation Models: Perception, Generation and Agentic Modeling

投稿日: 2025年2月13日作成者: jarxiv

要約人間の理解と生成は、デジタル人間とヒューマノイドの実施形態をモデル化するた … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.MM | コメントを受け付けていません

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning

投稿日: 2025年2月13日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLMS）は、短いビデオ理解で印象的なパフ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM | コメントを受け付けていません

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

投稿日: 2025年2月13日作成者: jarxiv

要約特にGPT-4Oに続く大規模な言語モデルの最近の進歩により、より多くのモダ … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.MM, cs.SD, eess.AS, eess.IV | コメントを受け付けていません

Learning Musical Representations for Music Performance Question Answering

投稿日: 2025年2月11日作成者: jarxiv

要約音楽パフォーマンスは、視聴覚モデリングの代表的なシナリオです。まばらなオ … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Latent Swap Joint Diffusion for Long-Form Audio Generation

投稿日: 2025年2月10日作成者: jarxiv

要約グローバルビューの拡散または反復生成を使用した長期のオーディオ生成に関する … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Long-tailed Medical Diagnosis with Relation-aware Representation Learning and Iterative Classifier Calibration

投稿日: 2025年2月10日作成者: jarxiv

要約最近、コンピューター支援診断により、有望なパフォーマンスが実証されており、 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.MM | コメントを受け付けていません

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

投稿日: 2025年2月7日作成者: jarxiv

要約特にGPTシリーズとO1モデルで、テキストベースの大手言語モデル（LLMS … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

投稿日: 2025年2月7日作成者: jarxiv

要約特にGPT-4Oに続く大規模な言語モデルの最近の進歩により、より多くのモダ … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.MM, cs.SD, eess.AS, eess.IV | コメントを受け付けていません

「cs.MM」カテゴリーアーカイブ

Interpretable Concept-based Deep Learning Framework for Multimodal Human Behavior Modeling

Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions

Human-Centric Foundation Models: Perception, Generation and Agentic Modeling

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Learning Musical Representations for Music Performance Question Answering

Latent Swap Joint Diffusion for Long-Form Audio Generation

Long-tailed Medical Diagnosis with Relation-aware Representation Learning and Iterative Classifier Calibration

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

最近の投稿

最近のコメント

アーカイブ

カテゴリー