「cs.CV」カテゴリーアーカイブ

StyleAdapter: A Unified Stylized Image Generation Model

投稿日: 2024年10月31日作成者: jarxiv

要約この作業は、特定のスタイルの参照画像と提供されるテキスト説明のコンテンツを … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

投稿日: 2024年10月31日作成者: jarxiv

要約人工知能は、特に Medical Large Vision Languag … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.CY, cs.LG | コメントを受け付けていません

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

投稿日: 2024年10月31日作成者: jarxiv

要約 GUI エージェントを構築する既存の取り組みは、GPT-4o や Gemi … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.HC | コメントを受け付けていません

DiaMond: Dementia Diagnosis with Multi-Modal Vision Transformers Using MRI and PET

投稿日: 2024年10月31日作成者: jarxiv

要約認知症、特にアルツハイマー病 (AD) と前頭側頭型認知症 (FTD) の … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Aligning Audio-Visual Joint Representations with an Agentic Workflow

投稿日: 2024年10月31日作成者: jarxiv

要約ビジュアルコンテンツと付随するオーディオ信号は、オーディオビジュアル ( … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

LGU-SLAM: Learnable Gaussian Uncertainty Matching with Deformable Correlation Sampling for Deep Visual SLAM

投稿日: 2024年10月31日作成者: jarxiv

要約 DROID などの深層視覚同時位置特定およびマッピング (SLAM) 技術 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Super-resolution in disordered media using neural networks

投稿日: 2024年10月31日作成者: jarxiv

要約我々は、大規模で多様なデータセットを活用して、強く散乱する媒体における周囲 … 続きを読む →

カテゴリー: cs.CV, cs.LG, eess.IV | コメントを受け付けていません

PointRecon: Online Point-based 3D Reconstruction via Ray-based 2D-3D Matching

投稿日: 2024年10月31日作成者: jarxiv

要約我々は、ポーズをとった単眼RGBビデオからの新しいオンラインのポイントベー … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

bit2bit: 1-bit quanta video reconstruction via self-supervised photon prediction

投稿日: 2024年10月31日作成者: jarxiv

要約 SPAD アレイなどの Quanta イメージセンサーは、数ナノ秒という … 続きを読む →

カテゴリー: 68T45, cs.CV, cs.LG, eess.IV, I.2.10 | コメントを受け付けていません

Is Your LiDAR Placement Optimized for 3D Scene Understanding?

投稿日: 2024年10月31日作成者: jarxiv

要約前例のない状況下での運転認識システムの信頼性は、実用化にとって非常に重要で … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

StyleAdapter: A Unified Stylized Image Generation Model

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

DiaMond: Dementia Diagnosis with Multi-Modal Vision Transformers Using MRI and PET

Aligning Audio-Visual Joint Representations with an Agentic Workflow

LGU-SLAM: Learnable Gaussian Uncertainty Matching with Deformable Correlation Sampling for Deep Visual SLAM

Super-resolution in disordered media using neural networks

PointRecon: Online Point-based 3D Reconstruction via Ray-based 2D-3D Matching

bit2bit: 1-bit quanta video reconstruction via self-supervised photon prediction

Is Your LiDAR Placement Optimized for 3D Scene Understanding?

最近の投稿

最近のコメント

アーカイブ

カテゴリー