投稿者「jarxiv」のアーカイブ

The Devil is in the Details: Simple Remedies for Image-to-LiDAR Representation Learning

投稿日: 2025年1月17日作成者: jarxiv

要約 LiDAR は自動運転において重要なセンサーであり、一般的にカメラと併用さ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Comparison of Various SLAM Systems for Mobile Robot in an Indoor Environment

投稿日: 2025年1月17日作成者: jarxiv

要約この記事では、さまざまな ROS ベースの SLAM システムによって計算 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Instruction-Guided Fusion of Multi-Layer Visual Features in Large Vision-Language Models

投稿日: 2025年1月17日作成者: jarxiv

要約大規模ビジョン言語モデル (LVLM) は、事前トレーニングされたビジョン … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Evaluating alignment between humans and neural network representations in image-based learning tasks

投稿日: 2025年1月17日作成者: jarxiv

要約人間はシーンやオブジェクトを豊富な特徴空間で表現し、少数の例を使用してカテ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

VanGogh: A Unified Multimodal Diffusion-based Framework for Video Colorization

投稿日: 2025年1月17日作成者: jarxiv

要約ビデオのカラー化は、時間的な一貫性と構造的な完全性を維持しながら、グレース … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Omni-Emotion: Extending Video MLLM with Detailed Face and Audio Modeling for Multimodal Emotion Analysis

投稿日: 2025年1月17日作成者: jarxiv

要約感情を正確に理解することは、人間とコンピューターのインタラクションなどの分 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

AnyStory: Towards Unified Single and Multiple Subject Personalization in Text-to-Image Generation

投稿日: 2025年1月17日作成者: jarxiv

要約最近、大規模な生成モデルは、優れたテキストから画像への生成機能を実証しまし … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

HydraMix: Multi-Image Feature Mixing for Small Data Image Classification

投稿日: 2025年1月17日作成者: jarxiv

要約ディープニューラルネットワークをトレーニングするには、多数の注釈付きサ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

A Multi-Modal Approach for Face Anti-Spoofing in Non-Calibrated Systems using Disparity Maps

投稿日: 2025年1月17日作成者: jarxiv

要約顔認識技術はさまざまなアプリケーションでますます使用されていますが、顔のな … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

AdaFV: Accelerating VLMs with Self-Adaptive Cross-Modality Attention Mixture

投稿日: 2025年1月17日作成者: jarxiv

要約 VLM の成功は、多くの場合、入力画像を複数のクロップに適応的に拡張して画 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

The Devil is in the Details: Simple Remedies for Image-to-LiDAR Representation Learning

Comparison of Various SLAM Systems for Mobile Robot in an Indoor Environment

Instruction-Guided Fusion of Multi-Layer Visual Features in Large Vision-Language Models

Evaluating alignment between humans and neural network representations in image-based learning tasks

VanGogh: A Unified Multimodal Diffusion-based Framework for Video Colorization

Omni-Emotion: Extending Video MLLM with Detailed Face and Audio Modeling for Multimodal Emotion Analysis

AnyStory: Towards Unified Single and Multiple Subject Personalization in Text-to-Image Generation

HydraMix: Multi-Image Feature Mixing for Small Data Image Classification

A Multi-Modal Approach for Face Anti-Spoofing in Non-Calibrated Systems using Disparity Maps

AdaFV: Accelerating VLMs with Self-Adaptive Cross-Modality Attention Mixture

最近の投稿

最近のコメント

アーカイブ

カテゴリー