「cs.CV」カテゴリーアーカイブ

RealCraft: Attention Control as A Tool for Zero-Shot Consistent Video Editing

投稿日: 2025年2月3日作成者: jarxiv

要約大規模なテキストから画像への生成モデルは、高品質の画像の合成において有望な … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Integrating Semi-Supervised and Active Learning for Semantic Segmentation

投稿日: 2025年2月3日作成者: jarxiv

要約この論文では、手動注釈のコストを削減し、モデルのパフォーマンスを向上させる … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

MTGA: Multi-View Temporal Granularity Aligned Aggregation for Event-Based Lip-Reading

投稿日: 2025年2月3日作成者: jarxiv

要約リップリーディングは、スピーカーの唇の動きの視覚情報を利用して、単語や文章 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models

投稿日: 2025年2月3日作成者: jarxiv

要約このペーパーでは、大規模なデータとモデルの時代における信頼できるガードレー … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Accelerating Diffusion Transformer via Error-Optimized Cache

投稿日: 2025年2月3日作成者: jarxiv

要約拡散トランス（DIT）は、コンテンツ生成に重要な方法です。ただし、サンプ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search

投稿日: 2025年2月3日作成者: jarxiv

要約テキストからビデオへの拡散モデルの顕著な進歩により、光エリスティックな世代 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

ContextFormer: Redefining Efficiency in Semantic Segmentation

投稿日: 2025年2月3日作成者: jarxiv

要約セマンティックセグメンテーションは、コンピュータービジョンにおける重要であ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs

投稿日: 2025年2月3日作成者: jarxiv

要約マルチモーダルモデルの最近の進歩により、視覚的認識、推論能力、視覚言語の理 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Classifying Deepfakes Using Swin Transformers

投稿日: 2025年2月3日作成者: jarxiv

要約ディープフェイクテクノロジーの急増は、デジタルメディアの信頼性と信頼性に大 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Neuro-LIFT: A Neuromorphic, LLM-based Interactive Framework for Autonomous Drone FlighT at the Edge

投稿日: 2025年2月3日作成者: jarxiv

要約自律システムへの人間の直感的な相互作用の統合は限られています。従来の自然 … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.NE, cs.RO, cs.SY, eess.SY | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

RealCraft: Attention Control as A Tool for Zero-Shot Consistent Video Editing

Integrating Semi-Supervised and Active Learning for Semantic Segmentation

MTGA: Multi-View Temporal Granularity Aligned Aggregation for Event-Based Lip-Reading

LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models

Accelerating Diffusion Transformer via Error-Optimized Cache

Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search

ContextFormer: Redefining Efficiency in Semantic Segmentation

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs

Classifying Deepfakes Using Swin Transformers

Neuro-LIFT: A Neuromorphic, LLM-based Interactive Framework for Autonomous Drone FlighT at the Edge

最近の投稿

最近のコメント

アーカイブ

カテゴリー