「cs.CV」カテゴリーアーカイブ

Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?

投稿日: 2025年6月3日作成者: jarxiv

要約ビジョン言語モデル（VLM）は、視覚的な質問応答と画像キャプションで印象的 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

MaxSup: Overcoming Representation Collapse in Label Smoothing

投稿日: 2025年6月3日作成者: jarxiv

要約ラベルスムージング（LS）は、ニューラルネットワークの予測への自信過剰を減 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Unveiling the Lack of LVLM Robustness to Fundamental Visual Variations: Why and Path Forward

投稿日: 2025年6月3日作成者: jarxiv

要約大規模なビジョン言語モデル（LVLMS）は、さまざまなビジョン言語タスクに … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL

投稿日: 2025年6月3日作成者: jarxiv

要約拡散モデルは、さまざまなドメインにわたって強力な生成ツールとして浮上してい … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

MDMP: Multi-modal Diffusion for supervised Motion Predictions with uncertainty

投稿日: 2025年6月3日作成者: jarxiv

要約このペーパーでは、骨格データとアクションのテキスト記述を統合および同期する … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Accurate Differential Operators for Hybrid Neural Fields

投稿日: 2025年6月3日作成者: jarxiv

要約ニューラルフィールドは、形状の表現から神経レンダリングまで、および部分的な … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR, cs.LG | コメントを受け付けていません

Fact-Checking of AI-Generated Reports

投稿日: 2025年6月3日作成者: jarxiv

要約生成人工知能（AI）の進歩により、放射線画像の予備読み取りのために現実的に … 続きを読む →

カテゴリー: cs.AI, cs.CR, cs.CV, cs.LG, eess.IV | コメントを受け付けていません

Enhancing Large Vision Model in Street Scene Semantic Understanding through Leveraging Posterior Optimization Trajectory

投稿日: 2025年6月3日作成者: jarxiv

要約自律運転（AD）知覚モデルの一般化を改善するには、継続的に収集されたデータ … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.RO | コメントを受け付けていません

SurgRIPE challenge: Benchmark of Surgical Robot Instrument Pose Estimation

投稿日: 2025年6月3日作成者: jarxiv

要約正確な機器のポーズ推定は、ロボット手術の将来に向けた重要なステップであり、 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

View-Invariant Policy Learning via Zero-Shot Novel View Synthesis

投稿日: 2025年6月3日作成者: jarxiv

要約大規模な視覚運動政策学習は、一般化可能な操作システムの開発に向けた有望なア … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?

MaxSup: Overcoming Representation Collapse in Label Smoothing

Unveiling the Lack of LVLM Robustness to Fundamental Visual Variations: Why and Path Forward

VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL

MDMP: Multi-modal Diffusion for supervised Motion Predictions with uncertainty

Accurate Differential Operators for Hybrid Neural Fields

Fact-Checking of AI-Generated Reports

Enhancing Large Vision Model in Street Scene Semantic Understanding through Leveraging Posterior Optimization Trajectory

SurgRIPE challenge: Benchmark of Surgical Robot Instrument Pose Estimation

View-Invariant Policy Learning via Zero-Shot Novel View Synthesis

最近の投稿

最近のコメント

アーカイブ

カテゴリー