「cs.CV」カテゴリーアーカイブ

Stereo Hand-Object Reconstruction for Human-to-Robot Handover

投稿日: 2024年12月11日作成者: jarxiv

要約手と物体の形状を共同推定することで、人間からロボットへの引き継ぎにおいてロ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

CMRNext: Camera to LiDAR Matching in the Wild for Localization and Extrinsic Calibration

投稿日: 2024年12月11日作成者: jarxiv

要約 LiDAR は、動的環境でのマッピングと位置特定に広く使用されています。 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

DeCLIP: Decoding CLIP representations for deepfake localization

投稿日: 2024年12月11日作成者: jarxiv

要約生成モデルはまったく新しい画像を作成できますが、人間の目には検出できない方 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Unsupervised Learning of Unbiased Visual Representations

投稿日: 2024年12月11日作成者: jarxiv

要約深いニューラルネットワークは、データセットバイアスの存在下で堅牢な表現を学 … 続きを読む →

カテゴリー: 68T07, cs.CV, cs.LG | コメントを受け付けていません

Enhancing Vision-Language Model Pre-training with Image-text Pair Pruning Based on Word Frequency

投稿日: 2024年12月11日作成者: jarxiv

要約我々は、VLM の効率を向上させる新しいデータプルーニング手法である単語 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types

投稿日: 2024年12月11日作成者: jarxiv

要約視覚的な質問応答 (VQA) は、特に視覚言語モデル (VLM) の一般化 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

BudgetFusion: Perceptually-Guided Adaptive Diffusion Models

投稿日: 2024年12月11日作成者: jarxiv

要約拡散モデルは、テキストから画像への生成というタスクにおいて前例のない成功を … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Mobile Video Diffusion

投稿日: 2024年12月11日作成者: jarxiv

要約ビデオ拡散モデルは、印象的なリアリズムと制御性を実現していますが、高い計算 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Multimodal Contextualized Support for Enhancing Video Retrieval System

投稿日: 2024年12月11日作成者: jarxiv

要約現在のビデオ検索システム、特に競技で使用されるシステムは、クリップ全体やビ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Faster and Better 3D Splatting via Group Training

投稿日: 2024年12月11日作成者: jarxiv

要約 3D ガウススプラッティング (3DGS) は、新しいビュー合成のための … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Stereo Hand-Object Reconstruction for Human-to-Robot Handover

CMRNext: Camera to LiDAR Matching in the Wild for Localization and Extrinsic Calibration

DeCLIP: Decoding CLIP representations for deepfake localization

Unsupervised Learning of Unbiased Visual Representations

Enhancing Vision-Language Model Pre-training with Image-text Pair Pruning Based on Word Frequency

Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types

BudgetFusion: Perceptually-Guided Adaptive Diffusion Models

Mobile Video Diffusion

Multimodal Contextualized Support for Enhancing Video Retrieval System

Faster and Better 3D Splatting via Group Training

最近の投稿

最近のコメント

アーカイブ

カテゴリー