「cs.CV」カテゴリーアーカイブ

Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation

投稿日: 2025年6月12日作成者: jarxiv

要約軌跡の自己回帰モデリングに基づいて構築された新しいVisuo-Motorポ … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.RO | コメントを受け付けていません

Text-Aware Image Restoration with Diffusion Models

投稿日: 2025年6月12日作成者: jarxiv

要約画像修復は、劣化した画像を回復することを目的としています。しかし、既存の … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

PlayerOne: Egocentric World Simulator

投稿日: 2025年6月12日作成者: jarxiv

要約鮮明に動的な環境内で没入型と無制限の探索を促進する、最初のエゴセントリック … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos

投稿日: 2025年6月12日作成者: jarxiv

要約変形可能なガウススプラット大きな再構成モデル（DGS-LRM）を紹介し … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR, cs.LG | コメントを受け付けていません

Fine-Grained Spatially Varying Material Selection in Images

投稿日: 2025年6月12日作成者: jarxiv

要約選択は、多くの画像編集プロセスの最初のステップであり、共通のモダリティを共 … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

MIRAGE: Multimodal foundation model and benchmark for comprehensive retinal OCT image analysis

投稿日: 2025年6月12日作成者: jarxiv

要約人工知能（AI）は、臨床医が光コヒーレンス断層撮影（OCT）などの眼科画像 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Do Multiple Instance Learning Models Transfer?

投稿日: 2025年6月12日作成者: jarxiv

要約複数のインスタンス学習（MIL）は、ギガピクセル組織画像から臨床的に意味の … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Video-CoT: A Comprehensive Dataset for Spatiotemporal Understanding of Videos Based on Chain-of-Thought

投稿日: 2025年6月12日作成者: jarxiv

要約ビデオ分析からインタラクティブなシステムに至るまで、ビデオコンテンツの理解 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Adapting Vision-Language Foundation Model for Next Generation Medical Ultrasound Image Analysis

投稿日: 2025年6月12日作成者: jarxiv

要約医学的超音波検査は、リンパ節、乳房、甲状腺などの表在臓器や組織を調べるため … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SkipVAR: Accelerating Visual Autoregressive Modeling via Adaptive Frequency-Aware Skipping

投稿日: 2025年6月12日作成者: jarxiv

要約視覚的自己回帰（VAR）モデルに関する最近の研究では、生成プロセスの高周波 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation

Text-Aware Image Restoration with Diffusion Models

PlayerOne: Egocentric World Simulator

DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos

Fine-Grained Spatially Varying Material Selection in Images

MIRAGE: Multimodal foundation model and benchmark for comprehensive retinal OCT image analysis

Do Multiple Instance Learning Models Transfer?

Video-CoT: A Comprehensive Dataset for Spatiotemporal Understanding of Videos Based on Chain-of-Thought

Adapting Vision-Language Foundation Model for Next Generation Medical Ultrasound Image Analysis

SkipVAR: Accelerating Visual Autoregressive Modeling via Adaptive Frequency-Aware Skipping

最近の投稿

最近のコメント

アーカイブ

カテゴリー