「cs.MM」カテゴリーアーカイブ

Can’t See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs

投稿日: 2025年6月4日作成者: jarxiv

要約マルチモーダル大規模言語モデル（MLLM）は、テキストと画像の両方を介した … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.MM | コメントを受け付けていません

Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step

投稿日: 2025年6月4日作成者: jarxiv

要約 Stable Diffusion や DALL-E 3 のようなテキストベ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CR, cs.CV, cs.MM | コメントを受け付けていません

MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query

投稿日: 2025年6月4日作成者: jarxiv

要約意味検索は現代のアプリケーションにとって極めて重要であるが、現在の研究では … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.MM | コメントを受け付けていません

IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation

投稿日: 2025年6月4日作成者: jarxiv

要約拡散ベースのモデルは、テキストまたは画像入力から高品質で高解像度のビデオシ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.MM | コメントを受け付けていません

Contrastive Alignment with Semantic Gap-Aware Corrections in Text-Video Retrieval

投稿日: 2025年6月3日作成者: jarxiv

要約テキストビデオ検索の最近の進歩は、主に対照的な学習フレームワークによって推 … 続きを読む →

カテゴリー: cs.CV, cs.IR, cs.MM | コメントを受け付けていません

I see what you mean: Co-Speech Gestures for Reference Resolution in Multimodal Dialogue

投稿日: 2025年6月3日作成者: jarxiv

要約対面の相互作用では、音声やジェスチャーを含む複数のモダリティを使用して、情 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.MM | コメントを受け付けていません

PixelThink: Towards Efficient Chain-of-Pixel Reasoning

投稿日: 2025年5月30日作成者: jarxiv

要約既存の推論セグメンテーションアプローチは、通常、画像テキストペアと対応する … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis

投稿日: 2025年5月30日作成者: jarxiv

要約基礎モデルと大規模な言語モデル（LLMS）の急速な進歩は、ミトリモーダル入 … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Multi-MLLM Knowledge Distillation for Out-of-Context News Detection

投稿日: 2025年5月29日作成者: jarxiv

要約マルチモーダルのコンテキスト外ニュースは、元のコンテキストの外で画像が使用 … 続きを読む →

カテゴリー: cs.CL, cs.MM | コメントを受け付けていません

Spatial Knowledge Graph-Guided Multimodal Synthesis

投稿日: 2025年5月29日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLM）の最近の進歩により、能力が大幅に向 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG, cs.MM | コメントを受け付けていません

「cs.MM」カテゴリーアーカイブ

Can’t See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs

Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step

MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query

IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation

Contrastive Alignment with Semantic Gap-Aware Corrections in Text-Video Retrieval

I see what you mean: Co-Speech Gestures for Reference Resolution in Multimodal Dialogue

PixelThink: Towards Efficient Chain-of-Pixel Reasoning

Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis

Multi-MLLM Knowledge Distillation for Out-of-Context News Detection

Spatial Knowledge Graph-Guided Multimodal Synthesis

最近の投稿

最近のコメント

アーカイブ

カテゴリー