「cs.CV」カテゴリーアーカイブ

LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization

投稿日: 2024年12月6日作成者: jarxiv

要約時間的アクションローカリゼーション (TAL) には、トリミングされてい … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting

投稿日: 2024年12月6日作成者: jarxiv

要約動的シーンとモーションを正確かつ効率的にモデリングすることは、時間的ダイナ … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

ActFusion: a Unified Diffusion Model for Action Segmentation and Anticipation

投稿日: 2024年12月6日作成者: jarxiv

要約時間的なアクションのセグメンテーションと長期的なアクションの予測は、ビデオ … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

GeoPos: A Minimal Positional Encoding for Enhanced Fine-Grained Details in Image Synthesis Using Convolutional Neural Networks

投稿日: 2024年12月6日作成者: jarxiv

要約人間の手や指に存在するような複雑な幾何学的特徴を画像生成モデルが再現できな … 続きを読む →

カテゴリー: 51, cs.AI, cs.CV, cs.LG, I.2.10 | コメントを受け付けていません

A Hitchhiker’s Guide to Understanding Performances of Two-Class Classifiers

投稿日: 2024年12月6日作成者: jarxiv

要約分類器のパフォーマンスを適切に理解することは、さまざまなシナリオにおいて不 … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.PF | コメントを受け付けていません

SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation

投稿日: 2024年12月6日作成者: jarxiv

要約大規模マルチモーダルモデル (LMM) は、多くのタスクや分野にわたって … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Discriminative Fine-tuning of LVLMs

投稿日: 2024年12月6日作成者: jarxiv

要約 CLIP のような対照的にトレーニングされた視覚言語モデル (VLM) は … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

MUSE-VL: Modeling Unified VLM through Semantic Discrete Encoding

投稿日: 2024年12月6日作成者: jarxiv

要約マルチモーダルな理解と生成のためのセマンティック離散エンコーディングによる … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding

投稿日: 2024年12月6日作成者: jarxiv

要約 3D 占有予測は周囲のシーンの包括的な説明を提供し、3D 認識にとって不可 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding

投稿日: 2024年12月6日作成者: jarxiv

要約 3D ビジュアルグラウンディング (3DVG) は、テキストの説明に基づ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization

DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting

ActFusion: a Unified Diffusion Model for Action Segmentation and Anticipation

GeoPos: A Minimal Positional Encoding for Enhanced Fine-Grained Details in Image Synthesis Using Convolutional Neural Networks

A Hitchhiker’s Guide to Understanding Performances of Two-Class Classifiers

SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation

Discriminative Fine-tuning of LVLMs

MUSE-VL: Modeling Unified VLM through Semantic Discrete Encoding

EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding

SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding

最近の投稿

最近のコメント

アーカイブ

カテゴリー