「cs.CV」カテゴリーアーカイブ

Be Decisive: Noise-Induced Layouts for Multi-Subject Generation

投稿日: 2025年5月28日作成者: jarxiv

要約複数の異なる被験者を生成することは、既存のテキストから画像間拡散モデルの課 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR, cs.LG | コメントを受け付けていません

Frame In-N-Out: Unbounded Controllable Image-to-Video Generation

投稿日: 2025年5月28日作成者: jarxiv

要約制御可能性、時間的一貫性、および詳細合成は、ビデオ生成における最も重要な課 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

投稿日: 2025年5月28日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLM）は、譲渡可能な敵の例に対して脆弱な … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

投稿日: 2025年5月28日作成者: jarxiv

要約このペーパーでは、GUIエージェントの2つの重要な課題に対処する自己改善フ … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

投稿日: 2025年5月28日作成者: jarxiv

要約アカデミックポスターの世代は、科学的コミュニケーションにおいて重要でありな … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.MA | コメントを受け付けていません

ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models

投稿日: 2025年5月28日作成者: jarxiv

要約ビジョン言語モデル（VLM）は、視覚コンテンツについての理解と推論において … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Vision Transformers with Self-Distilled Registers

投稿日: 2025年5月28日作成者: jarxiv

要約ビジョントランス（VIT）は、視覚処理タスクの支配的なアーキテクチャとして … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Generalizable and Relightable Gaussian Splatting for Human Novel View Synthesis

投稿日: 2025年5月28日作成者: jarxiv

要約私たちは、多様な照明条件下での高忠実度の人間の新規ビューの統合のための一般 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition

投稿日: 2025年5月28日作成者: jarxiv

要約効果的な人間との相互作用は、人間の感情を正確に認識して解釈するAIの能力に … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

DiffVLA: Vision-Language Guided Diffusion Planning for Autonomous Driving

投稿日: 2025年5月28日作成者: jarxiv

要約エンドツーエンドの自律運転に関する研究の関心は、モジュラータスク、つまり、 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.RO | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Be Decisive: Noise-Induced Layouts for Multi-Subject Generation

Frame In-N-Out: Unbounded Controllable Image-to-Video Generation

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models

Vision Transformers with Self-Distilled Registers

Generalizable and Relightable Gaussian Splatting for Human Novel View Synthesis

EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition

DiffVLA: Vision-Language Guided Diffusion Planning for Autonomous Driving

最近の投稿

最近のコメント

アーカイブ

カテゴリー