「cs.CV」カテゴリーアーカイブ

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

投稿日: 2024年10月4日作成者: jarxiv

要約ジェネレーティブAIの急速な発展は、コンテンツ作成を容易にするだけでなく、 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos

投稿日: 2024年10月4日作成者: jarxiv

要約最近、最新の大規模マルチモーダルモデル（LMM）は、短編ビデオの理解に関連 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks

投稿日: 2024年10月4日作成者: jarxiv

要約テキストリッチ画像は、テキストが全体的な理解を導く中心的な視覚要素として機 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Fake It Until You Break It: On the Adversarial Robustness of AI-generated Image Detectors

投稿日: 2024年10月4日作成者: jarxiv

要約ジェネレーティブAI（GenAI）は創造的で生産的なタスクに無数の可能性を … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Releasing the Parameter Latency of Neural Representation for High-Efficiency Video Compression

投稿日: 2024年10月4日作成者: jarxiv

要約何十年もの間、映像圧縮技術は著名な研究分野であった。従来のハイブリッド動画 … 続きを読む →

カテゴリー: cs.CV, cs.MM, eess.IV | コメントを受け付けていません

MOREL: Enhancing Adversarial Robustness through Multi-Objective Representation Learning

投稿日: 2024年10月4日作成者: jarxiv

要約広範な研究により、ディープニューラルネットワーク（DNN）は、わずかな敵対 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models

投稿日: 2024年10月4日作成者: jarxiv

要約眼科では、診断や治療計画のために詳細な画像解析に大きく依存している。大規模 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

OccRWKV: Rethinking Efficient 3D Semantic Occupancy Prediction with Linear Complexity

投稿日: 2024年10月3日作成者: jarxiv

要約 3D セマンティック占有予測ネットワークは、3D シーンの幾何学的およびセ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Improving Zero-Shot ObjectNav with Generative Communication

投稿日: 2024年10月3日作成者: jarxiv

要約我々は、潜在的に利用可能な環境認識をナビゲーション支援に利用することを目的 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction

投稿日: 2024年10月3日作成者: jarxiv

要約現実のロボットナビゲーションには、単に目的地に到達するだけではありません。 … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.RO | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos

Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks

Fake It Until You Break It: On the Adversarial Robustness of AI-generated Image Detectors

Releasing the Parameter Latency of Neural Representation for High-Efficiency Video Compression

MOREL: Enhancing Adversarial Robustness through Multi-Objective Representation Learning

LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models

OccRWKV: Rethinking Efficient 3D Semantic Occupancy Prediction with Linear Complexity

Improving Zero-Shot ObjectNav with Generative Communication

CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction

最近の投稿

最近のコメント

アーカイブ

カテゴリー