「cs.CV」カテゴリーアーカイブ

Balancing Accuracy, Calibration, and Efficiency in Active Learning with Vision Transformers Under Label Noise

投稿日: 2025年5月8日作成者: jarxiv

要約下流タスクのImagenet上の事前に訓練された畳み込みニューラルネットワ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition

投稿日: 2025年5月8日作成者: jarxiv

要約 Audio-Visuual Speech Speech Septureat … 続きを読む →

カテゴリー: cs.CV, cs.SD, eess.AS | コメントを受け付けていません

Geometry-Aware Texture Generation for 3D Head Modeling with Artist-driven Control

投稿日: 2025年5月8日作成者: jarxiv

要約正確な芸術的ビジョンに一致する仮想キャラクターのための現実的な3Dヘッド資 … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

Predicting Road Surface Anomalies by Visual Tracking of a Preceding Vehicle

投稿日: 2025年5月8日作成者: jarxiv

要約前の車両の視覚的追跡により、路面の異常を検出するための新しいアプローチが提 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer

投稿日: 2025年5月8日作成者: jarxiv

要約このペーパーでは、リップリーディング用の効率的な視覚音声エンコーダーを紹介 … 続きを読む →

カテゴリー: cs.CV, eess.AS | コメントを受け付けていません

Deep residual learning with product units

投稿日: 2025年5月8日作成者: jarxiv

要約製品ユニットを残留ブロックに統合して、深い畳み込みネットワークの表現力とパ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

投稿日: 2025年5月8日作成者: jarxiv

要約近年、マルチモーダル理解モデルと画像生成モデルの両方で顕著な進歩が見られて … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MFSeg: Efficient Multi-frame 3D Semantic Segmentation

投稿日: 2025年5月8日作成者: jarxiv

要約効率的なマルチフレーム3Dセマンティックセグメンテーションフレームワークで … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception

投稿日: 2025年5月8日作成者: jarxiv

要約高密度の視覚的予測タスクは、事前定義されたカテゴリへの依存によって制約され … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

RLMiniStyler: Light-weight RL Style Agent for Arbitrary Sequential Neural Style Generation

投稿日: 2025年5月8日作成者: jarxiv

要約任意のスタイル転送は、特定の芸術的画像のスタイルを別のコンテンツ画像に適用 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Balancing Accuracy, Calibration, and Efficiency in Active Learning with Vision Transformers Under Label Noise

mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition

Geometry-Aware Texture Generation for 3D Head Modeling with Artist-driven Control

Predicting Road Surface Anomalies by Visual Tracking of a Preceding Vehicle

SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer

Deep residual learning with product units

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

MFSeg: Efficient Multi-frame 3D Semantic Segmentation

DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception

RLMiniStyler: Light-weight RL Style Agent for Arbitrary Sequential Neural Style Generation

最近の投稿

最近のコメント

アーカイブ

カテゴリー