月別アーカイブ: 2025年4月

Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing

投稿日: 2025年4月15日作成者: jarxiv

要約テキストからイメージの生成により、拡散モデルで画期的な進歩が見られ、高忠実 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MESA: Text-Driven Terrain Generation Using Latent Diffusion and Global Copernicus Data

投稿日: 2025年4月15日作成者: jarxiv

要約地形モデリングは、伝統的に手続き的手法に依存してきました。これは、多くの場 … 続きを読む →

カテゴリー: cs.CV, cs.GR, cs.LG | コメントを受け付けていません

Multimodal Long Video Modeling Based on Temporal Dynamic Context

投稿日: 2025年4月15日作成者: jarxiv

要約大規模な言語モデル（LLMS）の最近の進歩により、ビデオ理解の大きなブレー … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG, cs.MM | コメントを受け付けていません

Learning Free Token Reduction for Multi-Modal Large Language Models

投稿日: 2025年4月15日作成者: jarxiv

要約ビジョン言語モデル（VLM）は、さまざまなマルチモーダルタスクで顕著な成功 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World Users

投稿日: 2025年4月15日作成者: jarxiv

要約長老のWebベースのタスクで成功した支援を達成するには、AIエージェントは … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Integrating Vision and Location with Transformers: A Multimodal Deep Learning Framework for Medical Wound Analysis

投稿日: 2025年4月15日作成者: jarxiv

要約急性および癒しが困難な創傷の効果的な認識は、創傷診断に必要なステップです。 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents

投稿日: 2025年4月15日作成者: jarxiv

要約グラフィカルユーザーインターフェイス（GUI）エージェントの構築における既 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.HC | コメントを受け付けていません

MB-TaylorFormer V2: Improved Multi-branch Linear Transformer Expanded by Taylor Formula for Image Restoration

投稿日: 2025年4月15日作成者: jarxiv

要約最近、トランスネットワークは、グローバルな受容フィールドと入力への適応性に … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

投稿日: 2025年4月15日作成者: jarxiv

要約このペーパーでは、単一のアーキテクチャ内で生のピクセルエンコードと言語デコ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding

投稿日: 2025年4月15日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLM）は、きめ細かいピクセルレベルの理解 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2025年4月

Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing

MESA: Text-Driven Terrain Generation Using Latent Diffusion and Global Copernicus Data

Multimodal Long Video Modeling Based on Temporal Dynamic Context

Learning Free Token Reduction for Multi-Modal Large Language Models

RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World Users

Integrating Vision and Location with Transformers: A Multimodal Deep Learning Framework for Medical Wound Analysis

GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents

MB-TaylorFormer V2: Improved Multi-branch Linear Transformer Expanded by Taylor Formula for Image Restoration

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding

最近の投稿

最近のコメント

アーカイブ

カテゴリー