月別アーカイブ: 2025年3月

OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer

投稿日: 2025年3月14日作成者: jarxiv

要約オープンボキャブラリー複数のオブジェクトトラッキングは、トレーニング中にト … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation

投稿日: 2025年3月14日作成者: jarxiv

要約この作業では、テキストからイメージの生成のための拡散トランス（DIT）を経 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding

投稿日: 2025年3月14日作成者: jarxiv

要約大規模なマルチモーダルモデル（LMM）は、さまざまな視覚的質問応答（VQA … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Transformers without Normalization

投稿日: 2025年3月14日作成者: jarxiv

要約正規化層は、現代のニューラルネットワークで遍在しており、長い間不可欠である … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness

投稿日: 2025年3月14日作成者: jarxiv

要約ボディを3D服を着た人間のポイントクラウドに取り付けることは、一般的であり … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR | コメントを受け付けていません

LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds

投稿日: 2025年3月14日作成者: jarxiv

要約単一の画像からのアニメーション可能な3Dヒト再構築は、ジオメトリ、外観、お … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

NIL: No-data Imitation Learning by Leveraging Pre-trained Video Diffusion Models

投稿日: 2025年3月14日作成者: jarxiv

要約ヒューマノイドロボット、4倍、動物など、多様で型破りな形態学を介して身体的 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems

投稿日: 2025年3月14日作成者: jarxiv

要約大規模なマルチモーダルモデル（LMMS）の急速な進歩により、科学的問題解決 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology

投稿日: 2025年3月14日作成者: jarxiv

要約敵対的な攻撃は、信頼性が不可欠なヘルスケアのような重要な分野の視界モデルに … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

投稿日: 2025年3月14日作成者: jarxiv

要約この論文では、ユニバーサルゼロショットの目標指向ナビゲーションの一般的なフ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

月別アーカイブ: 2025年3月

OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer

DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation

DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding

Transformers without Normalization

ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness

LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds

NIL: No-data Imitation Learning by Leveraging Pre-trained Video Diffusion Models

SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems

Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology

UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

最近の投稿

最近のコメント

アーカイブ

カテゴリー