月別アーカイブ: 2025年1月

OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis

投稿日: 2025年1月9日作成者: jarxiv

要約オムニモーダル学習の最近の進歩は、主に独自のモデル内ではあるものの、画像、 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Learnable Scaled Gradient Descent for Guaranteed Robust Tensor PCA

投稿日: 2025年1月9日作成者: jarxiv

要約ロバストテンソル主成分分析 (RTPCA) は、多次元データから低ランク成 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Supervision-free Vision-Language Alignment

投稿日: 2025年1月9日作成者: jarxiv

要約視覚言語モデル (VLM) は、視覚情報と言語情報の統合において顕著な可能 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

PointDreamer: Zero-shot 3D Textured Mesh Reconstruction from Colored Point Cloud

投稿日: 2025年1月9日作成者: jarxiv

要約カラー点群からテクスチャメッシュを再構築することは重要ですが、困難な作業 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Unified Coding for Both Human Perception and Generalized Machine Analytics with CLIP Supervision

投稿日: 2025年1月9日作成者: jarxiv

要約デコードされたビットストリームは通常、人間またはマシンのニーズにのみ対応し … 続きを読む →

カテゴリー: cs.CV, cs.MM | コメントを受け付けていません

Towards Revisiting Visual Place Recognition for Joining Submaps in Multimap SLAM

投稿日: 2025年1月9日作成者: jarxiv

要約 Visual SLAM は、多くの自律システムにとって重要なテクノロジーで … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Boosting Salient Object Detection with Knowledge Distillated from Large Foundation Models

投稿日: 2025年1月9日作成者: jarxiv

要約 Salient Object Detection (SOD) は、シーン内 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Identity-Preserving Video Dubbing Using Motion Warping

投稿日: 2025年1月9日作成者: jarxiv

要約ビデオダビングは、リファレンスビデオと運転音声信号からリアルなリップシンク … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity

投稿日: 2025年1月9日作成者: jarxiv

要約ビジョントランスフォーマー (ViT) は、セルフアテンションメカニ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

FrontierNet: Learning Visual Cues to Explore

投稿日: 2025年1月9日作成者: jarxiv

要約未知の環境の探索は自律ロボットにとって非常に重要です。これにより、地図作 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

月別アーカイブ: 2025年1月

OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis

Learnable Scaled Gradient Descent for Guaranteed Robust Tensor PCA

Supervision-free Vision-Language Alignment

PointDreamer: Zero-shot 3D Textured Mesh Reconstruction from Colored Point Cloud

Unified Coding for Both Human Perception and Generalized Machine Analytics with CLIP Supervision

Towards Revisiting Visual Place Recognition for Joining Submaps in Multimap SLAM

Boosting Salient Object Detection with Knowledge Distillated from Large Foundation Models

Identity-Preserving Video Dubbing Using Motion Warping

LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity

FrontierNet: Learning Visual Cues to Explore

最近の投稿

最近のコメント

アーカイブ

カテゴリー