「cs.CV」カテゴリーアーカイブ

An Accurate and Real-time Relative Pose Estimation from Triple Point-line Images by Decoupling Rotation and Translation

投稿日: 2025年1月14日作成者: jarxiv

要約ラインフィーチャは、人工環境におけるポイントフィーチャを補完する有効な … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

ActiveGAMER: Active GAussian Mapping through Efficient Rendering

投稿日: 2025年1月14日作成者: jarxiv

要約 3D ガウススプラッティング (3DGS) を利用して高品質でリアルタイ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Multi-face emotion detection for effective Human-Robot Interaction

投稿日: 2025年1月14日作成者: jarxiv

要約モバイルデバイスへの対話インターフェイスの統合は広く普及しており、幅広い … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.HC, cs.RO | コメントを受け付けていません

SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis

投稿日: 2025年1月14日作成者: jarxiv

要約現実的な人間とオブジェクトのインタラクションモーションを合成することは、 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

LEO: Boosting Mixture of Vision Encoders for Multimodal Large Language Models

投稿日: 2025年1月14日作成者: jarxiv

要約強化された視覚的理解は、マルチモーダル大規模言語モデル (MLLM) の基 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Images are Achilles’ Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models

投稿日: 2025年1月14日作成者: jarxiv

要約この論文では、マルチモーダル大規模言語モデル (MLLM) の無害性アライ … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs

投稿日: 2025年1月14日作成者: jarxiv

要約物体、シーン、または状況を比較する能力は、日常生活における効果的な意思決定 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos

投稿日: 2025年1月14日作成者: jarxiv

要約組織病理学における診断には、グローバルな全スライド画像 (WSI) 解析が … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

投稿日: 2025年1月14日作成者: jarxiv

要約マルチモーダル大規模言語モデル (MLLM) の開発における急速な進歩によ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

投稿日: 2025年1月14日作成者: jarxiv

要約ビジョン言語モデル (VLM) の開発は、大規模で多様なマルチモーダルデ … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

An Accurate and Real-time Relative Pose Estimation from Triple Point-line Images by Decoupling Rotation and Translation

ActiveGAMER: Active GAussian Mapping through Efficient Rendering

Multi-face emotion detection for effective Human-Robot Interaction

SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis

LEO: Boosting Mixture of Vision Encoders for Multimodal Large Language Models

Images are Achilles’ Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models

MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs

Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature

最近の投稿

最近のコメント

アーカイブ

カテゴリー