「cs.CV」カテゴリーアーカイブ

TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft

投稿日: 2024年12月9日作成者: jarxiv

要約コラボレーションは社会の基礎です。現実の世界では、人間のチームメイトは多 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.MA | コメントを受け付けていません

Extrapolated Urban View Synthesis Benchmark

投稿日: 2024年12月9日作成者: jarxiv

要約フォトリアリスティックなシミュレーターは、ビジョン中心の自動運転車 (AV … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

Mind the Time: Temporally-Controlled Multi-Event Video Generation

投稿日: 2024年12月9日作成者: jarxiv

要約現実世界のビデオは一連のイベントで構成されます。このようなシーケンスを正 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo

投稿日: 2024年12月9日作成者: jarxiv

要約密な 3D 対応により、1 つの物体から目に見えない対応物への空間的、機能 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

投稿日: 2024年12月9日作成者: jarxiv

要約 InternVL 2.5 は、InternVL 2.0 をベースに構築され … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SimC3D: A Simple Contrastive 3D Pretraining Framework Using RGB Images

投稿日: 2024年12月9日作成者: jarxiv

要約 3D 対照学習パラダイムは、点群データでの事前トレーニングを通じて下流タス … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models

投稿日: 2024年12月9日作成者: jarxiv

要約 Text-to-Video モデルは、多様で魅力的なビデオコンテンツを生 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Text to Blind Motion

投稿日: 2024年12月9日作成者: jarxiv

要約視覚障害者は、晴眼者とは世界の認識が異なるため、動作特性が明確になる場合が … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Sparse autoencoders reveal selective remapping of visual concepts during adaptation

投稿日: 2024年12月9日作成者: jarxiv

要約基礎モデルを特定の目的に適合させることは、下流アプリケーション用の機械学習 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Birth and Death of a Rose

投稿日: 2024年12月9日作成者: jarxiv

要約私たちは、事前にトレーニングされた 2D 基礎モデルから、時間的なオブジェ … 続きを読む →

カテゴリー: cs.CV, cs.GR, I.2.10 | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft

Extrapolated Urban View Synthesis Benchmark

Mind the Time: Temporally-Controlled Multi-Event Video Generation

DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

SimC3D: A Simple Contrastive 3D Pretraining Framework Using RGB Images

MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models

Text to Blind Motion

Sparse autoencoders reveal selective remapping of visual concepts during adaptation

Birth and Death of a Rose

最近の投稿

最近のコメント

アーカイブ

カテゴリー