「cs.CV」カテゴリーアーカイブ

Enhancing Target-unspecific Tasks through a Features Matrix

投稿日: 2025年5月8日作成者: jarxiv

要約大規模なビジョン言語モデルの迅速な学習の最近の開発により、ターゲット固有の … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Breaking Annotation Barriers: Generalized Video Quality Assessment via Ranking-based Self-Supervision

投稿日: 2025年5月8日作成者: jarxiv

要約ビデオ品質評価（VQA）は、カメラキャプチャシステムからオーバーザトップス … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

PAHA: Parts-Aware Audio-Driven Human Animation with Diffusion Model

投稿日: 2025年5月8日作成者: jarxiv

要約オーディオ駆動型のヒューマンアニメーションテクノロジーは、ヒューマンコンピ … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

Visual Imitation Enables Contextual Humanoid Control

投稿日: 2025年5月8日作成者: jarxiv

要約ヒューマノイドに階段を登り、周囲の環境のコンテキストを使用して椅子に座るよ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Uncertainty-Aware Prototype Semantic Decoupling for Text-Based Person Search in Full Images

投稿日: 2025年5月8日作成者: jarxiv

要約完全な画像のテキストベースの歩行者検索（TBPS）は、自然言語の説明を使用 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Automated Data Curation Using GPS & NLP to Generate Instruction-Action Pairs for Autonomous Vehicle Vision-Language Navigation Datasets

投稿日: 2025年5月7日作成者: jarxiv

要約命令アクション（IA）データペアは、ロボットシステム、特に自動運転車（AV … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.RO | コメントを受け付けていません

OccCylindrical: Multi-Modal Fusion with Cylindrical Representation for 3D Semantic Occupancy Prediction

投稿日: 2025年5月7日作成者: jarxiv

要約自動運転車（AVS）の安全な操作は、周囲の理解に大きく依存しています。こ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Robotic Visual Instruction

投稿日: 2025年5月7日作成者: jarxiv

要約最近、自然言語は、人間とロボットの相互作用の主要な媒体でした。ただし、空 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.RO | コメントを受け付けていません

LiftFeat: 3D Geometry-Aware Local Feature Matching

投稿日: 2025年5月7日作成者: jarxiv

要約堅牢で効率的なローカル機能マッチングは、スラムやロボット工学の視覚的ローカ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation

投稿日: 2025年5月7日作成者: jarxiv

要約大規模なデータセットで訓練されたテキストツービデオ（T2V）生成モデルの進 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Enhancing Target-unspecific Tasks through a Features Matrix

Breaking Annotation Barriers: Generalized Video Quality Assessment via Ranking-based Self-Supervision

PAHA: Parts-Aware Audio-Driven Human Animation with Diffusion Model

Visual Imitation Enables Contextual Humanoid Control

Uncertainty-Aware Prototype Semantic Decoupling for Text-Based Person Search in Full Images

Automated Data Curation Using GPS & NLP to Generate Instruction-Action Pairs for Autonomous Vehicle Vision-Language Navigation Datasets

OccCylindrical: Multi-Modal Fusion with Cylindrical Representation for 3D Semantic Occupancy Prediction

Robotic Visual Instruction

LiftFeat: 3D Geometry-Aware Local Feature Matching

The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation

最近の投稿

最近のコメント

アーカイブ

カテゴリー