「cs.CV」カテゴリーアーカイブ

Diffusion-based Visual Anagram as Multi-task Learning

投稿日: 2024年12月4日作成者: jarxiv

要約視覚的アナグラムとは、反転や回転などの変換によって外観が変化する画像のこと … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Motion Prompting: Controlling Video Generation with Motion Trajectories

投稿日: 2024年12月4日作成者: jarxiv

要約モーション制御は、表現力豊かで魅力的な映像コンテンツを生成するために極めて … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

FaVoR: Features via Voxel Rendering for Camera Relocalization

投稿日: 2024年12月3日作成者: jarxiv

要約カメラの再位置推定方法は、密な画像の位置合わせからクエリ画像からの直接的な … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Good Grasps Only: A data engine for self-supervised fine-tuning of pose estimation using grasp poses for verification

投稿日: 2024年12月3日作成者: jarxiv

要約この論文では、姿勢推定の自己教師あり微調整のための新しい方法を紹介します。 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

PACA: Perspective-Aware Cross-Attention Representation for Zero-Shot Scene Rearrangement

投稿日: 2024年12月3日作成者: jarxiv

要約テーブルの整理整頓と同様に、シーンの再配置は、さまざまなオブジェクトの配置 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Right Place, Right Time! Generalizing ObjectNav to Dynamic Environments with Portable Targets

投稿日: 2024年12月3日作成者: jarxiv

要約 ObjectNav は、エージェントが目に見えない環境にあるターゲットオ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

投稿日: 2024年12月3日作成者: jarxiv

要約ビジョン言語モデル (VLM) は、マルチモーダル推論タスクにおいて目覚ま … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

NoisyNN: Exploring the Impact of Information Entropy Change in Learning Systems

投稿日: 2024年12月3日作成者: jarxiv

要約私たちは、埋め込み空間や画像など、さまざまなレベルでのノイズ注入による深層 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

投稿日: 2024年12月3日作成者: jarxiv

要約チャート形式のデータ視覚化はデータ分析において極めて重要な役割を果たし、重 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Visual Cue Enhancement and Dual Low-Rank Adaptation for Efficient Visual Instruction Fine-Tuning

投稿日: 2024年12月3日作成者: jarxiv

要約パラメーター効率の高い微調整マルチモーダル大規模言語モデル (MLLM) … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Diffusion-based Visual Anagram as Multi-task Learning

Motion Prompting: Controlling Video Generation with Motion Trajectories

FaVoR: Features via Voxel Rendering for Camera Relocalization

Good Grasps Only: A data engine for self-supervised fine-tuning of pose estimation using grasp poses for verification

PACA: Perspective-Aware Cross-Attention Representation for Zero-Shot Scene Rearrangement

Right Place, Right Time! Generalizing ObjectNav to Dynamic Environments with Portable Targets

Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

NoisyNN: Exploring the Impact of Information Entropy Change in Learning Systems

From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

Visual Cue Enhancement and Dual Low-Rank Adaptation for Efficient Visual Instruction Fine-Tuning

最近の投稿

最近のコメント

アーカイブ

カテゴリー