「cs.CV」カテゴリーアーカイブ

Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution

投稿日: 2024年11月6日作成者: jarxiv

要約消費者向けの深度カメラの制限とデータ送信時の帯域幅の制限により、圧縮ソース … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection

投稿日: 2024年11月6日作成者: jarxiv

要約私たちは、半教師あり単眼 3D 物体検出 (SSM3OD) の擬似ラベリン … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

ShadowMamba: State-Space Model with Boundary-Region Selective Scan for Shadow Removal

投稿日: 2024年11月6日作成者: jarxiv

要約画像の影の除去は典型的な低レベルの視覚の問題であり、影の存在により特定の領 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DiT4Edit: Diffusion Transformer for Image Editing

投稿日: 2024年11月6日作成者: jarxiv

要約 UNet ベースの画像編集は最近進歩していますが、高解像度画像で形状を認識 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Cognitive Planning for Object Goal Navigation using Generative AI Models

投稿日: 2024年11月6日作成者: jarxiv

要約生成 AI、特に大規模言語モデル (LLM) と大規模視覚言語モデル (L … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

DAAL: Density-Aware Adaptive Line Margin Loss for Multi-Modal Deep Metric Learning

投稿日: 2024年11月6日作成者: jarxiv

要約マルチモーダルディープメトリクスラーニングは、顔認証、きめ細かいオブジェク … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Inference Optimal VLMs Need Only One Visual Token but Larger Models

投稿日: 2024年11月6日作成者: jarxiv

要約ビジョン言語モデル (VLM) は、さまざまな視覚的理解と推論タスクにわた … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Classification Done Right for Vision-Language Pre-Training

投稿日: 2024年11月6日作成者: jarxiv

要約画像テキストデータに対する視覚言語の事前トレーニングのための非常にシンプル … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning

投稿日: 2024年11月6日作成者: jarxiv

要約近年、一般的なドメインのマルチモーダルベンチマークにより、一般的なタスク … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Digi2Real: Bridging the Realism Gap in Synthetic Data Face Recognition via Foundation Models

投稿日: 2024年11月6日作成者: jarxiv

要約顔認識システムの精度は、収集された大量のデータとニューラルネットワーク … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution

Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection

ShadowMamba: State-Space Model with Boundary-Region Selective Scan for Shadow Removal

DiT4Edit: Diffusion Transformer for Image Editing

Cognitive Planning for Object Goal Navigation using Generative AI Models

DAAL: Density-Aware Adaptive Line Margin Loss for Multi-Modal Deep Metric Learning

Inference Optimal VLMs Need Only One Visual Token but Larger Models

Classification Done Right for Vision-Language Pre-Training

MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning

Digi2Real: Bridging the Realism Gap in Synthetic Data Face Recognition via Foundation Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー