「cs.CV」カテゴリーアーカイブ

ZeroVO: Visual Odometry with Minimal Assumptions

投稿日: 2025年6月10日作成者: jarxiv

要約多様なカメラや環境でゼロショット一般化を達成する新しい視覚臭気（VO）アル … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Dreamland: Controllable World Creation with Simulator and Generative Models

投稿日: 2025年6月10日作成者: jarxiv

要約大規模なビデオ生成モデルは、ダイナミックな世界創造のための多様で現実的な視 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Hidden in plain sight: VLMs overlook their visual representations

投稿日: 2025年6月10日作成者: jarxiv

要約言語は、視覚タスクのパフォーマンスを指定および評価するための自然なインター … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

投稿日: 2025年6月10日作成者: jarxiv

要約自己回帰ビデオ拡散モデルの新しいトレーニングパラダイムである自己強制を紹介 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior

投稿日: 2025年6月10日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLMS）は、グラフィカルユーザーインター … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Play to Generalize: Learning to Reason Through Game Play

投稿日: 2025年6月10日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLM）における一般化可能な推論機能の開発 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Vision Transformers Don’t Need Trained Registers

投稿日: 2025年6月10日作成者: jarxiv

要約視覚変圧器における以前に特定された現象の根底にあるメカニズムを調査します。 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos

投稿日: 2025年6月10日作成者: jarxiv

要約ダイナミックシーンの再構築のための4Dガウスベースのトランスモデルである4 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets

投稿日: 2025年6月10日作成者: jarxiv

要約密な予測のためのマルチタスク学習は、すべてのタスクの広範な注釈の必要性によ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Fine-grained Hierarchical Crop Type Classification from Integrated Hyperspectral EnMAP Data and Multispectral Sentinel-2 Time Series: A Large-scale Dataset and Dual-stream Transformer Method

投稿日: 2025年6月10日作成者: jarxiv

要約細粒の作物タイプの分類は、大規模な作物マッピングの基本的な基礎として機能し … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

ZeroVO: Visual Odometry with Minimal Assumptions

Dreamland: Controllable World Creation with Simulator and Generative Models

Hidden in plain sight: VLMs overlook their visual representations

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior

Play to Generalize: Learning to Reason Through Game Play

Vision Transformers Don’t Need Trained Registers

4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos

StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets

Fine-grained Hierarchical Crop Type Classification from Integrated Hyperspectral EnMAP Data and Multispectral Sentinel-2 Time Series: A Large-scale Dataset and Dual-stream Transformer Method

最近の投稿

最近のコメント

アーカイブ

カテゴリー