投稿者「jarxiv」のアーカイブ

Edge Attention Module for Object Classification

投稿日: 2025年2月6日作成者: jarxiv

要約この研究では、オブジェクト分類タスクに関する新しい「エッジ注意ベースの畳み … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Tell2Reg: Establishing spatial correspondence between images by the same language prompts

投稿日: 2025年2月6日作成者: jarxiv

要約空間的対応は、セグメント化された領域のペアで表すことができ、画像登録ネット … 続きを読む →

カテゴリー: 00B25, cs.AI, cs.CV, eess.IV, I.2.7 | コメントを受け付けていません

3D Face Reconstruction From Radar Images

投稿日: 2025年2月6日作成者: jarxiv

要約顔の3D再構成は、コンピュータービジョンで広く注目され、たとえばアニメーシ … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Assessing Open-world Forgetting in Generative Image Model Customization

投稿日: 2025年2月6日作成者: jarxiv

要約拡散モデルの最近の進歩により、画像生成機能が大幅に向上しています。ただし … 続きを読む →

カテゴリー: cs.CV, cs.GR, cs.LG | コメントを受け付けていません

ImgTrojan: Jailbreaking Vision-Language Models with ONE Image

投稿日: 2025年2月6日作成者: jarxiv

要約大規模な言語モデル（LLMS）の人間の価値の調整に関心が高まっています。 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Dual-Flow: Transferable Multi-Target, Instance-Agnostic Attacks via In-the-wild Cascading Flow Optimization

投稿日: 2025年2月6日作成者: jarxiv

要約敵対的な攻撃は、モデルの堅牢性を評価するために広く使用されており、ブラック … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding

投稿日: 2025年2月6日作成者: jarxiv

要約最新のビデオ大規模な言語モデル（VLLM）は、ビデオ理解のために均一なフレ … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

投稿日: 2025年2月6日作成者: jarxiv

要約 RPG、安定した拡散3、フラックスなどの高度な拡散モデルは、構成テキストか … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence

投稿日: 2025年2月6日作成者: jarxiv

要約最近の具体化されたエージェントは、主に強化学習（RL）または大手言語モデル … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent

投稿日: 2025年2月6日作成者: jarxiv

要約 MotionAgentを提案し、テキスト誘導画像からビデオへの生成のための … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

Edge Attention Module for Object Classification

Tell2Reg: Establishing spatial correspondence between images by the same language prompts

3D Face Reconstruction From Radar Images

Assessing Open-world Forgetting in Generative Image Model Customization

ImgTrojan: Jailbreaking Vision-Language Models with ONE Image

Dual-Flow: Transferable Multi-Target, Instance-Agnostic Attacks via In-the-wild Cascading Flow Optimization

MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding

IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence

MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent

最近の投稿

最近のコメント

アーカイブ

カテゴリー