「cs.CV」カテゴリーアーカイブ

SurgPose: Generalisable Surgical Instrument Pose Estimation using Zero-Shot Learning and Stereo Vision

投稿日: 2025年5月19日作成者: jarxiv

要約ロボット支援低侵襲手術（RMIS）における外科的ツールの正確なポーズ推定は … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding

投稿日: 2025年5月19日作成者: jarxiv

要約合成ビデオ生成は、そのリアリズムと幅広いアプリケーションに対して大きな注目 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Disentangling CLIP for Multi-Object Perception

投稿日: 2025年5月19日作成者: jarxiv

要約 Clip Excelのようなビジョン言語モデルは、シーン内の単一の顕著なオ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation

投稿日: 2025年5月19日作成者: jarxiv

要約大規模なマルチモーダルモデル（LMMS）は現在、多くのビジョン言語ベンチマ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers

投稿日: 2025年5月19日作成者: jarxiv

要約変圧器ベースのモデルは、解釈が困難な隠された状態を生成します。この作業で … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Exploiting Radiance Fields for Grasp Generation on Novel Synthetic Views

投稿日: 2025年5月19日作成者: jarxiv

要約ビジョンベースのロボット操作は、カメラを使用して、操作するオブジェクトを含 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment

投稿日: 2025年5月19日作成者: jarxiv

要約拡散モデルは、テキストの説明から高品質の画像を生成する際に顕著な進歩を遂げ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation

投稿日: 2025年5月19日作成者: jarxiv

要約自律駆動システムは、敵対的な歩行者の動き、危険な車両の操作、突然の環境の変 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

CoMP: Continual Multimodal Pre-training for Vision Foundation Models

投稿日: 2025年5月19日作成者: jarxiv

要約事前に訓練されたVision Foundationモデル（VFM）は、幅広 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Unsupervised Detection of Distribution Shift in Inverse Problems using Diffusion Models

投稿日: 2025年5月19日作成者: jarxiv

要約拡散モデルは、イメージングの逆の問題の事前に広く使用されています。ただし … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

SurgPose: Generalisable Surgical Instrument Pose Estimation using Zero-Shot Learning and Stereo Vision

VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding

Disentangling CLIP for Multi-Object Perception

HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation

Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers

Exploiting Radiance Fields for Grasp Generation on Novel Synthetic Views

PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment

INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation

CoMP: Continual Multimodal Pre-training for Vision Foundation Models

Unsupervised Detection of Distribution Shift in Inverse Problems using Diffusion Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー