月別アーカイブ: 2025年5月

WATCH: Weighted Adaptive Testing for Changepoint Hypotheses via Weighted-Conformal Martingales

投稿日: 2025年5月8日作成者: jarxiv

要約ハイステークス設定で人工知能（AI） /機械学習（ML）システムを責任を持 … 続きを読む →

カテゴリー: cs.AI, cs.LG, stat.ML | コメントを受け付けていません

Score Distillation Sampling for Audio: Source Separation, Synthesis, and Beyond

投稿日: 2025年5月8日作成者: jarxiv

要約オーディオSDSを紹介します。オーディオSDは、テキストコンディショニング … 続きを読む →

カテゴリー: 68T07, cs.AI, cs.LG, cs.MM, cs.SD, eess.AS, H.5.1 | コメントを受け付けていません

mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition

投稿日: 2025年5月8日作成者: jarxiv

要約 Audio-Visuual Speech Speech Septureat … 続きを読む →

カテゴリー: cs.CV, cs.SD, eess.AS | コメントを受け付けていません

Geometry-Aware Texture Generation for 3D Head Modeling with Artist-driven Control

投稿日: 2025年5月8日作成者: jarxiv

要約正確な芸術的ビジョンに一致する仮想キャラクターのための現実的な3Dヘッド資 … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

Predicting Road Surface Anomalies by Visual Tracking of a Preceding Vehicle

投稿日: 2025年5月8日作成者: jarxiv

要約前の車両の視覚的追跡により、路面の異常を検出するための新しいアプローチが提 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer

投稿日: 2025年5月8日作成者: jarxiv

要約このペーパーでは、リップリーディング用の効率的な視覚音声エンコーダーを紹介 … 続きを読む →

カテゴリー: cs.CV, eess.AS | コメントを受け付けていません

Deep residual learning with product units

投稿日: 2025年5月8日作成者: jarxiv

要約製品ユニットを残留ブロックに統合して、深い畳み込みネットワークの表現力とパ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

投稿日: 2025年5月8日作成者: jarxiv

要約近年、マルチモーダル理解モデルと画像生成モデルの両方で顕著な進歩が見られて … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MFSeg: Efficient Multi-frame 3D Semantic Segmentation

投稿日: 2025年5月8日作成者: jarxiv

要約効率的なマルチフレーム3Dセマンティックセグメンテーションフレームワークで … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception

投稿日: 2025年5月8日作成者: jarxiv

要約高密度の視覚的予測タスクは、事前定義されたカテゴリへの依存によって制約され … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2025年5月

WATCH: Weighted Adaptive Testing for Changepoint Hypotheses via Weighted-Conformal Martingales

Score Distillation Sampling for Audio: Source Separation, Synthesis, and Beyond

mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition

Geometry-Aware Texture Generation for 3D Head Modeling with Artist-driven Control

Predicting Road Surface Anomalies by Visual Tracking of a Preceding Vehicle

SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer

Deep residual learning with product units

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

MFSeg: Efficient Multi-frame 3D Semantic Segmentation

DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception

最近の投稿

最近のコメント

アーカイブ

カテゴリー