「cs.CV」カテゴリーアーカイブ

ARFlow: Human Action-Reaction Flow Matching with Physical Guidance

投稿日: 2025年6月3日作成者: jarxiv

要約因果的な人間の相互作用をモデル化する際の基本的な課題である人間の行動反応統 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models?

投稿日: 2025年6月3日作成者: jarxiv

要約マルチモーダルの大手言語モデル（MLLM）の境界をピクセルレベルの理解に向 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Stochastic Layer-Wise Shuffle for Improving Vision Mamba Training

投稿日: 2025年6月3日作成者: jarxiv

要約最近のVision Mamba（VIM）モデルは、シーケンスの長さがほぼ線 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles

投稿日: 2025年6月3日作成者: jarxiv

要約ルールベースの強化学習（RL）をマルチモーダル大手言語モデル（MLLMS） … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Keypoint-Integrated Instruction-Following Data Generation for Enhanced Human Pose and Action Understanding in Multimodal Models

投稿日: 2025年6月3日作成者: jarxiv

要約現在のビジョン言語マルチモーダルモデルは、一般的な視覚的理解タスクに適して … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Parameter Efficient Fine-Tuning of Segment Anything Model for Biomedical Imaging

投稿日: 2025年6月3日作成者: jarxiv

要約セグメンテーションは、生物医学画像の重要な分析タスクであり、個々のオルガネ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

OmniCaptioner: One Captioner to Rule Them All

投稿日: 2025年6月3日作成者: jarxiv

要約 Omnicaptionerを提案します。これは、さまざまな視覚ドメインにわ … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

DIS-CO: Discovering Copyrighted Content in VLMs Training Data

投稿日: 2025年6月3日作成者: jarxiv

要約トレーニングデータに直接アクセスすることなく、著作権で保護されたコンテンツ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, I.2 | コメントを受け付けていません

CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image

投稿日: 2025年6月3日作成者: jarxiv

要約このペーパーは、ロボット操作タスクにおける明確なオブジェクトのカテゴリレベ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Improving Medical Large Vision-Language Models with Abnormal-Aware Feedback

投稿日: 2025年6月3日作成者: jarxiv

要約既存の医療大規模視覚言語モデル（MED-LVLMS）は、広範な医療知識をカ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

ARFlow: Human Action-Reaction Flow Matching with Physical Guidance

PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models?

Stochastic Layer-Wise Shuffle for Improving Vision Mamba Training

Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles

Keypoint-Integrated Instruction-Following Data Generation for Enhanced Human Pose and Action Understanding in Multimodal Models

Parameter Efficient Fine-Tuning of Segment Anything Model for Biomedical Imaging

OmniCaptioner: One Captioner to Rule Them All

DIS-CO: Discovering Copyrighted Content in VLMs Training Data

CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image

Improving Medical Large Vision-Language Models with Abnormal-Aware Feedback

最近の投稿

最近のコメント

アーカイブ

カテゴリー