「cs.CV」カテゴリーアーカイブ

Pixel Is Not A Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models

投稿日: 2024年8月22日作成者: jarxiv

要約拡散モデルは、高品質の画像合成のための強力な生成モデルとして登場し、その後 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

EmbodiedSAM: Online Segment Any 3D Thing in Real Time

投稿日: 2024年8月22日作成者: jarxiv

要約具現化されたタスクでは、エージェントが探索と同時に 3D シーンを完全に理 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs

投稿日: 2024年8月22日作成者: jarxiv

要約マルチモーダル大規模言語モデル (MLLM) は、最近、顕著な知覚能力と推 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SynPlay: Importing Real-world Diversity for a Synthetic Human Dataset

投稿日: 2024年8月22日作成者: jarxiv

要約現実世界における人間の外見の多様性を引き出すことを目的とした、新しい合成人 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models

投稿日: 2024年8月22日作成者: jarxiv

要約大規模マルチモーダルモデル (LMM) は、多くの視覚的なタスクにわたっ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

投稿日: 2024年8月22日作成者: jarxiv

要約ロングコンテキスト機能は、マルチモーダル基盤モデル、特に長時間のビデオを理 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Generative AI in Industrial Machine Vision — A Review

投稿日: 2024年8月22日作成者: jarxiv

要約マシンビジョンは、機械が視覚データを解釈し、それに基づいて動作できるよう … 続きを読む →

カテゴリー: cs.CV, cs.LG, eess.IV | コメントを受け付けていません

V-RoAst: A New Dataset for Visual Road Assessment

投稿日: 2024年8月22日作成者: jarxiv

要約道路交通事故は毎年何百万人もの死者を出しており、特に低・中所得国（LMIC … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.ET | コメントを受け付けていません

D$^3$FlowSLAM: Self-Supervised Dynamic SLAM with Flow Motion Decomposition and DINO Guidance

投稿日: 2024年8月22日作成者: jarxiv

要約この論文では、動的コンポーネントを正確に識別しながら動的シーンで堅牢に動作 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

CrossFi: A Cross Domain Wi-Fi Sensing Framework Based on Siamese Network

投稿日: 2024年8月22日作成者: jarxiv

要約近年、Wi-Fi センシングは、プライバシー保護、低コスト、浸透能力などの … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, eess.SP | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Pixel Is Not A Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models

EmbodiedSAM: Online Segment Any 3D Thing in Real Time

SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs

SynPlay: Importing Real-world Diversity for a Synthetic Human Dataset

GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Generative AI in Industrial Machine Vision — A Review

V-RoAst: A New Dataset for Visual Road Assessment

D$^3$FlowSLAM: Self-Supervised Dynamic SLAM with Flow Motion Decomposition and DINO Guidance

CrossFi: A Cross Domain Wi-Fi Sensing Framework Based on Siamese Network

最近の投稿

最近のコメント

アーカイブ

カテゴリー