「cs.MM」カテゴリーアーカイブ

In Anticipation of Perfect Deepfake: Identity-anchored Artifact-agnostic Detection under Rebalanced Deepfake Detection Protocol

投稿日: 2024年5月2日作成者: jarxiv

要約ディープ生成モデルが進歩するにつれて、ディープフェイクが「完璧」、つまり認 … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model

投稿日: 2024年5月2日作成者: jarxiv

要約感情 AI は、人間の感情状態を理解するコンピューターの能力です。既存の … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

Towards Real-world Video Face Restoration: A New Benchmark

投稿日: 2024年5月1日作成者: jarxiv

要約画像上のブラインド顔復元（BFR）はここ数年で大幅に進歩しましたが、現実世 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM, eess.IV | コメントを受け付けていません

ComposerX: Multi-Agent Symbolic Music Composition with LLMs

投稿日: 2024年5月1日作成者: jarxiv

要約音楽作曲は人類の創造的な側面を表しており、それ自体が長い依存関係とハーモニ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

SemiPL: A Semi-supervised Method for Event Sound Source Localization

投稿日: 2024年5月1日作成者: jarxiv

要約近年、イベント音源定位はさまざまな分野で広く応用されています。最近の作品 … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models

投稿日: 2024年4月30日作成者: jarxiv

要約画像検索は、マルチメディアおよびコンピュータビジョンにおいて極めて重要な … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

Exploring the Distinctiveness and Fidelity of the Descriptions Generated by Large Vision-Language Models

投稿日: 2024年4月29日作成者: jarxiv

要約 Large Vision-Language Model (LVLM) は、 … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions

投稿日: 2024年4月26日作成者: jarxiv

要約可視赤外線スペクトルの手がかりを統合したクロスモダリティ画像は、物体検出の … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.RO, eess.IV | コメントを受け付けていません

Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs

投稿日: 2024年4月25日作成者: jarxiv

要約マルチモーダル LLM は LLM の自然な進化であり、純粋なテキストモ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.MM | コメントを受け付けていません

Seeing Text in the Dark: Algorithm and Benchmark

投稿日: 2024年4月25日作成者: jarxiv

要約視覚的に劣化するため、暗い環境でテキストをローカライズするのは困難です。 … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

「cs.MM」カテゴリーアーカイブ

In Anticipation of Perfect Deepfake: Identity-anchored Artifact-agnostic Detection under Rebalanced Deepfake Detection Protocol

EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model

Towards Real-world Video Face Restoration: A New Benchmark

ComposerX: Multi-Agent Symbolic Music Composition with LLMs

SemiPL: A Semi-supervised Method for Event Sound Source Localization

Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models

Exploring the Distinctiveness and Fidelity of the Descriptions Generated by Large Vision-Language Models

CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions

Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs

Seeing Text in the Dark: Algorithm and Benchmark

最近の投稿

最近のコメント

アーカイブ

カテゴリー