月別アーカイブ: 2024年3月

Learning Topological Representations for Deep Image Understanding

投稿日: 2024年3月25日作成者: jarxiv

要約多くのシナリオ、特に生物医学応用では、ニューロン、組織、血管などの複雑で細 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Augmented Reality based Simulated Data (ARSim) with multi-view consistency for AV perception networks

投稿日: 2024年3月25日作成者: jarxiv

要約自動運転システムの有効性には、さまざまな運転シナリオ下で多様な物体を検出す … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.RO | コメントを受け付けていません

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

投稿日: 2024年3月25日作成者: jarxiv

要約アクション認識、ビデオテキストタスク、およびビデオ中心の対話において最先端 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Long-CLIP: Unlocking the Long-Text Capability of CLIP

投稿日: 2024年3月25日作成者: jarxiv

要約 Contrastive Language-Image Pre-traini … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DragAPart: Learning a Part-Level Motion Prior for Articulated Objects

投稿日: 2024年3月25日作成者: jarxiv

要約 DragAPart というメソッドを紹介します。このメソッドは、画像と一連 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars

投稿日: 2024年3月25日作成者: jarxiv

要約実際のアプリケーションでは、多くの場合、一貫したテーマを共有する 3D ア … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting

投稿日: 2024年3月25日作成者: jarxiv

要約シーン表現として 3D ガウスを使用する高密度同時位置特定およびマッピング … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis

投稿日: 2024年3月25日作成者: jarxiv

要約最近のテキストから 3D への生成アプローチでは、印象的な 3D 結果が生 … 続きを読む →

カテゴリー: 68T45, cs.AI, cs.CV, cs.GR, cs.LG, I.2.6 | コメントを受け付けていません

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

投稿日: 2024年3月25日作成者: jarxiv

要約大規模マルチモーダルモデル (LMM) は、ビジュアルエンコーダーと大 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data

投稿日: 2024年3月25日作成者: jarxiv

要約最近、部分的に注釈が付けられたデータから複数の高密度シーン理解タスクを学習 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

月別アーカイブ: 2024年3月

Learning Topological Representations for Deep Image Understanding

Augmented Reality based Simulated Data (ARSim) with multi-view consistency for AV perception networks

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

Long-CLIP: Unlocking the Long-Text Capability of CLIP

DragAPart: Learning a Part-Level Motion Prior for Articulated Objects

ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars

Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting

LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data

最近の投稿

最近のコメント

アーカイブ

カテゴリー