投稿者「jarxiv」のアーカイブ

IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments

投稿日: 2025年6月12日作成者: jarxiv

要約 Intphys 2は、深い学習モデルの直感的な物理的理解を評価するために設 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models

投稿日: 2025年6月12日作成者: jarxiv

要約画像のシーケンス上の推論は、マルチモーダルの大手言語モデル（MLLMS）に … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

ContentV: Efficient Training of Video Generation Models with Limited Compute

投稿日: 2025年6月12日作成者: jarxiv

要約ビデオ生成の最近の進歩は、計算コストのエスカレートを緩和するためにますます … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Leveraging Depth and Language for Open-Vocabulary Domain-Generalized Semantic Segmentation

投稿日: 2025年6月12日作成者: jarxiv

要約セマンティックセグメンテーション（DGSS）におけるオープンボキャブラリー … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SpikeSMOKE: Spiking Neural Networks for Monocular 3D Object Detection with Cross-Scale Gated Coding

投稿日: 2025年6月12日作成者: jarxiv

要約 3Dオブジェクト検出のための低エネルギー消費は、自律運転などの分野での幅広 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation

投稿日: 2025年6月12日作成者: jarxiv

要約ビジョン言語モデル（VLM）は、多様な視覚的および言語的タスクで顕著なパフ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Traveling Waves Integrate Spatial Information Through Time

投稿日: 2025年6月12日作成者: jarxiv

要約神経活動の移動波は脳で広く観察されていますが、それらの正確な計算機能は不明 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images without Any 3D Knowledge

投稿日: 2025年6月12日作成者: jarxiv

要約一般化可能な新規ビューシンシシス（NVS）の問題を検討します。これは、シー … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

EquiCaps: Predictor-Free Pose-Aware Pre-Trained Capsule Networks

投稿日: 2025年6月12日作成者: jarxiv

要約変換と不変で等しくなる自己監視表現を学ぶことは、従来の視覚分類タスクを超え … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

CEM-FBGTinyDet: Context-Enhanced Foreground Balance with Gradient Tuning for tiny Objects

投稿日: 2025年6月12日作成者: jarxiv

要約 Tiny Object Detection（TOD）は、特徴のピラミッドネ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments

ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models

ContentV: Efficient Training of Video Generation Models with Limited Compute

Leveraging Depth and Language for Open-Vocabulary Domain-Generalized Semantic Segmentation

SpikeSMOKE: Spiking Neural Networks for Monocular 3D Object Detection with Cross-Scale Gated Coding

3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation

Traveling Waves Integrate Spatial Information Through Time

The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images without Any 3D Knowledge

EquiCaps: Predictor-Free Pose-Aware Pre-Trained Capsule Networks

CEM-FBGTinyDet: Context-Enhanced Foreground Balance with Gradient Tuning for tiny Objects

最近の投稿

最近のコメント

アーカイブ

カテゴリー