「cs.CV」カテゴリーアーカイブ

ChromaFormer: A Scalable and Accurate Transformer Architecture for Land Cover Classification

投稿日: 2025年3月12日作成者: jarxiv

要約センチネルなどのシステムからのリモートセンシング画像は、約10メートルの解 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

投稿日: 2025年3月12日作成者: jarxiv

要約 DeepSeek-R1-Zeroは、補強学習（RL）を通じて純粋にLLMS … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

VAGUE: Visual Contexts Clarify Ambiguous Expressions

投稿日: 2025年3月12日作成者: jarxiv

要約人間のコミュニケーションは、多くの場合、曖昧さを解決するために視覚的な手が … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Silent Hazards of Token Reduction in Vision-Language Models: The Hidden Impact on Consistency

投稿日: 2025年3月12日作成者: jarxiv

要約ビジョン言語モデル（VLM）は視覚的な推論に優れていますが、多くの場合、高 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

KinMo: Kinematic-aware Human Motion Understanding and Generation

投稿日: 2025年3月12日作成者: jarxiv

要約現在の人間のモーション合成フレームワークは、グローバルなアクションの説明に … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR | コメントを受け付けていません

Q-PETR: Quant-aware Position Embedding Transformation for Multi-View 3D Object Detection

投稿日: 2025年3月12日作成者: jarxiv

要約カメラベースのマルチビュー3D検出は、低コストと幅広い適用性により、自律運 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training

投稿日: 2025年3月12日作成者: jarxiv

要約検証可能な結果報酬（RLVR）による補強学習は、大規模な言語モデル（LLM … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Prediction of Frozen Region Growth in Kidney Cryoablation Intervention Using a 3D Flow-Matching Model

投稿日: 2025年3月12日作成者: jarxiv

要約この研究では、腎臓の凍結アブレーション中の凍結領域（氷玉）の進行を予測する … 続きを読む →

カテゴリー: cs.AI, cs.CV, eess.IV | コメントを受け付けていません

Forgotten Polygons: Multimodal Large Language Models are Shape-Blind

投稿日: 2025年3月12日作成者: jarxiv

要約ビジョン言語のタスクでのパフォーマンスが強いにもかかわらず、マルチモーダル … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis

投稿日: 2025年3月12日作成者: jarxiv

要約テキストからイメージ（T2I）の生成は、拡散モデルで大きな進歩を遂げており … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

ChromaFormer: A Scalable and Accurate Transformer Architecture for Land Cover Classification

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

VAGUE: Visual Contexts Clarify Ambiguous Expressions

Silent Hazards of Token Reduction in Vision-Language Models: The Hidden Impact on Consistency

KinMo: Kinematic-aware Human Motion Understanding and Generation

Q-PETR: Quant-aware Position Embedding Transformation for Multi-View 3D Object Detection

GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training

Prediction of Frozen Region Growth in Kidney Cryoablation Intervention Using a 3D Flow-Matching Model

Forgotten Polygons: Multimodal Large Language Models are Shape-Blind

GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis

最近の投稿

最近のコメント

アーカイブ

カテゴリー