月別アーカイブ: 2025年5月

Seeing through Satellite Images at Street Views

投稿日: 2025年5月23日作成者: jarxiv

要約このペーパーでは、衛星画像と指定されたカメラの位置または軌道を指定したフォ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association

投稿日: 2025年5月23日作成者: jarxiv

要約私たちは、最近マルチモーダルコミュニティに関心を集めている顔と声の間の学習 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning

投稿日: 2025年5月23日作成者: jarxiv

要約インターネットビデオからの潜在的な動きを学ぶことは、ジェネラリストのロボッ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Deep mineralogical segmentation of thin section images based on QEMSCAN maps

投稿日: 2025年5月23日作成者: jarxiv

要約岩の薄切片の鉱物学的側面を解釈することは、石油とガスの貯水池の評価にとって … 続きを読む →

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space

投稿日: 2025年5月23日作成者: jarxiv

要約ビデオコンテンツに基づいてさまざまなフレームにトークンを柔軟に割り当てるこ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding

投稿日: 2025年5月23日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLM）は、問題を解決するタスクで印象的な … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

When Are Concepts Erased From Diffusion Models?

投稿日: 2025年5月23日作成者: jarxiv

要約モデルが特定の概念を生成するのを選択的に防止する能力である概念消去は、関心 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models

投稿日: 2025年5月23日作成者: jarxiv

要約マルチモーダルの大手言語モデル（MLLM）は視覚的なタスクで急速に進歩して … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Interactive Post-Training for Vision-Language-Action Models

投稿日: 2025年5月23日作成者: jarxiv

要約リップVLAを紹介します。これは、スパースバイナリの成功報酬のみを使用して … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

投稿日: 2025年5月23日作成者: jarxiv

要約最近の進歩は、大規模な言語モデル（LLM）の考え方（COT）の推論能力を高 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

月別アーカイブ: 2025年5月

Seeing through Satellite Images at Street Views

PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association

CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning

Deep mineralogical segmentation of thin section images based on QEMSCAN maps

Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space

SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding

When Are Concepts Erased From Diffusion Models?

Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models

Interactive Post-Training for Vision-Language-Action Models

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

最近の投稿

最近のコメント

アーカイブ

カテゴリー