投稿者「jarxiv」のアーカイブ

Pursuing Temporal-Consistent Video Virtual Try-On via Dynamic Pose Interaction

投稿日: 2025年5月23日作成者: jarxiv

要約ビデオバーチャルトライオンは、特定の衣服を備えたビデオで主題をシームレスに … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

Extremely Simple Multimodal Outlier Synthesis for Out-of-Distribution Detection and Segmentation

投稿日: 2025年5月23日作成者: jarxiv

要約分散除外（OOD）検出とセグメンテーションは、自律運転やロボット支援手術な … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding

投稿日: 2025年5月23日作成者: jarxiv

要約この作業では、最初の離散拡散マルチモーダル大手言語モデル（DMLLM）であ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

An Effective Training Framework for Light-Weight Automatic Speech Recognition Models

投稿日: 2025年5月23日作成者: jarxiv

要約深い学習における最近の進歩により、計算およびメモリの制約を無視しながら有望 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Native Segmentation Vision Transformers

投稿日: 2025年5月23日作成者: jarxiv

要約均一なダウンサンプリングは、視覚バックボーンの空間分解能を減らすための事実 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Seeing through Satellite Images at Street Views

投稿日: 2025年5月23日作成者: jarxiv

要約このペーパーでは、衛星画像と指定されたカメラの位置または軌道を指定したフォ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association

投稿日: 2025年5月23日作成者: jarxiv

要約私たちは、最近マルチモーダルコミュニティに関心を集めている顔と声の間の学習 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning

投稿日: 2025年5月23日作成者: jarxiv

要約インターネットビデオからの潜在的な動きを学ぶことは、ジェネラリストのロボッ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Deep mineralogical segmentation of thin section images based on QEMSCAN maps

投稿日: 2025年5月23日作成者: jarxiv

要約岩の薄切片の鉱物学的側面を解釈することは、石油とガスの貯水池の評価にとって … 続きを読む →

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space

投稿日: 2025年5月23日作成者: jarxiv

要約ビデオコンテンツに基づいてさまざまなフレームにトークンを柔軟に割り当てるこ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

Pursuing Temporal-Consistent Video Virtual Try-On via Dynamic Pose Interaction

Extremely Simple Multimodal Outlier Synthesis for Out-of-Distribution Detection and Segmentation

Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding

An Effective Training Framework for Light-Weight Automatic Speech Recognition Models

Native Segmentation Vision Transformers

Seeing through Satellite Images at Street Views

PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association

CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning

Deep mineralogical segmentation of thin section images based on QEMSCAN maps

Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space

最近の投稿

最近のコメント

アーカイブ

カテゴリー