月別アーカイブ: 2024年6月

Single-image camera calibration with model-free distortion correction

投稿日: 2024年6月25日作成者: jarxiv

要約カメラのキャリブレーションは、正確な定量的測定を必要とするコンピュータビ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Unsupervised Domain Adaptation for Pediatric Brain Tumor Segmentation

投稿日: 2024年6月25日作成者: jarxiv

要約成人神経膠腫の正確な自動セグメンテーションモデルの構築に向けて大幅な進歩 … 続きを読む →

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking

投稿日: 2024年6月25日作成者: jarxiv

要約身体化されたエージェントは、非構造化環境で動作するために堅牢なナビゲーショ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts

投稿日: 2024年6月25日作成者: jarxiv

要約ビジョン言語モデル (VLM) におけるロングコンテキストの抽出推論を評価 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Long Context Transfer from Language to Vision

投稿日: 2024年6月25日作成者: jarxiv

要約ビデオシーケンスは貴重な時間情報を提供しますが、既存の大規模マルチモーダ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

投稿日: 2024年6月25日作成者: jarxiv

要約パーソナライズされた画像生成は、パーソナライズされたコンテンツを創造的に生 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

投稿日: 2024年6月25日作成者: jarxiv

要約視覚中心のアプローチで設計されたマルチモーダル LLM (MLLM) ファ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Dreamitate: Real-World Visuomotor Policy Learning via Video Generation

投稿日: 2024年6月25日作成者: jarxiv

要約操作における主な課題は、多様な視覚環境に堅牢に一般化できるポリシーを学習す … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models

投稿日: 2024年6月25日作成者: jarxiv

要約普及モデルはビデオ生成において顕著な能力を実証しており、生成プロセスに軌道 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models

投稿日: 2024年6月25日作成者: jarxiv

要約参照表現理解 (REC) には、テキストの説明に基づいてターゲットインス … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2024年6月

Single-image camera calibration with model-free distortion correction

Unsupervised Domain Adaptation for Pediatric Brain Tumor Segmentation

From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking

Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts

Long Context Transfer from Language to Vision

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Dreamitate: Real-World Visuomotor Policy Learning via Video Generation

FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models

Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー