「cs.CV」カテゴリーアーカイブ

Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis

投稿日: 2025年5月16日作成者: jarxiv

要約最近の進歩により、マルチイメージ情報を理解するためのマルチモーダル大手言語 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations

投稿日: 2025年5月16日作成者: jarxiv

要約模倣は人間の基本的な学習メカニズムであり、個人が専門家を観察し模倣すること … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data

投稿日: 2025年5月16日作成者: jarxiv

要約光リアリックな拡散モデルの開発により、合成データで部分的または完全にトレー … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Style Customization of Text-to-Vector Generation with Image Diffusion Priors

投稿日: 2025年5月16日作成者: jarxiv

要約スケーラブルなベクトルグラフィックス（SVG）は、解像度の独立性とよく組織 … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning

投稿日: 2025年5月16日作成者: jarxiv

要約大規模なマルチモーダルモデルのトレーニングに広く使用されている自然言語画像 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

End-to-End Vision Tokenizer Tuning

投稿日: 2025年5月16日作成者: jarxiv

要約既存の視覚トークン化は、視覚トークンがさまざまなタスク、例えば画像生成や視 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Depth Anything with Any Prior

投稿日: 2025年5月16日作成者: jarxiv

要約このワークは、以前の深さを提示します。これは、不完全であるが正確なメトリッ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

3D-Fixup: Advancing Photo Editing with 3D Priors

投稿日: 2025年5月16日作成者: jarxiv

要約拡散モデルを介したモデリング画像プリエアの大幅な進歩にもかかわらず、オブジ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Behind Maya: Building a Multilingual Vision Language Model

投稿日: 2025年5月16日作成者: jarxiv

要約最近では、大規模なビジョン言語モデル（VLM）の急速な発展が見られました。 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

RT-cache: Efficient Robot Trajectory Retrieval System

投稿日: 2025年5月15日作成者: jarxiv

要約このホワイトペーパーでは、ビッグデータの検索を活用して経験から学ぶことによ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis

UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations

Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data

Style Customization of Text-to-Vector Generation with Image Diffusion Priors

MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning

End-to-End Vision Tokenizer Tuning

Depth Anything with Any Prior

3D-Fixup: Advancing Photo Editing with 3D Priors

Behind Maya: Building a Multilingual Vision Language Model

RT-cache: Efficient Robot Trajectory Retrieval System

最近の投稿

最近のコメント

アーカイブ

カテゴリー