投稿者「jarxiv」のアーカイブ

Aligning Text, Images, and 3D Structure Token-by-Token

投稿日: 2025年6月10日作成者: jarxiv

要約 3Dで世界を理解できるマシンの作成は、3次元空間内でナビゲートおよび相互作 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Audio-Sync Video Generation with Multi-Stream Temporal Control

投稿日: 2025年6月10日作成者: jarxiv

要約オーディオは本質的に一時的であり、視覚的な世界と密接に同期されているため、 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Dynamic View Synthesis as an Inverse Problem

投稿日: 2025年6月10日作成者: jarxiv

要約この作業では、トレーニングなしの設定での逆の問題として、単眼動画からの動的 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

ZeroVO: Visual Odometry with Minimal Assumptions

投稿日: 2025年6月10日作成者: jarxiv

要約多様なカメラや環境でゼロショット一般化を達成する新しい視覚臭気（VO）アル … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Dreamland: Controllable World Creation with Simulator and Generative Models

投稿日: 2025年6月10日作成者: jarxiv

要約大規模なビデオ生成モデルは、ダイナミックな世界創造のための多様で現実的な視 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Hidden in plain sight: VLMs overlook their visual representations

投稿日: 2025年6月10日作成者: jarxiv

要約言語は、視覚タスクのパフォーマンスを指定および評価するための自然なインター … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

投稿日: 2025年6月10日作成者: jarxiv

要約自己回帰ビデオ拡散モデルの新しいトレーニングパラダイムである自己強制を紹介 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior

投稿日: 2025年6月10日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLMS）は、グラフィカルユーザーインター … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Play to Generalize: Learning to Reason Through Game Play

投稿日: 2025年6月10日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLM）における一般化可能な推論機能の開発 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Vision Transformers Don’t Need Trained Registers

投稿日: 2025年6月10日作成者: jarxiv

要約視覚変圧器における以前に特定された現象の根底にあるメカニズムを調査します。 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

Aligning Text, Images, and 3D Structure Token-by-Token

Audio-Sync Video Generation with Multi-Stream Temporal Control

Dynamic View Synthesis as an Inverse Problem

ZeroVO: Visual Odometry with Minimal Assumptions

Dreamland: Controllable World Creation with Simulator and Generative Models

Hidden in plain sight: VLMs overlook their visual representations

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior

Play to Generalize: Learning to Reason Through Game Play

Vision Transformers Don’t Need Trained Registers

最近の投稿

最近のコメント

アーカイブ

カテゴリー