「cs.CV」カテゴリーアーカイブ

Discriminating image representations with principal distortions

投稿日: 2025年5月19日作成者: jarxiv

要約画像表現（人工的または生物学的）は、多くの場合、グローバルな幾何学的構造の … 続きを読む →

カテゴリー: cs.CV, cs.LG, q-bio.NC, stat.ML | コメントを受け付けていません

GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing

投稿日: 2025年5月19日作成者: jarxiv

要約自然言語の指示を使用した画像の編集は、視覚的なコンテンツを変更する自然で表 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

QVGen: Pushing the Limit of Quantized Video Generative Models

投稿日: 2025年5月19日作成者: jarxiv

要約ビデオ拡散モデル（DMS）により、高品質のビデオ統合が可能になりました。 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations

投稿日: 2025年5月19日作成者: jarxiv

要約模倣は人間の基本的な学習メカニズムであり、個人が専門家を観察し模倣すること … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Large-Scale Gaussian Splatting SLAM

投稿日: 2025年5月16日作成者: jarxiv

要約最近開発されたニューラル放射輝度（NERF）および3Dガウススプラッティン … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation

投稿日: 2025年5月16日作成者: jarxiv

要約このペーパーでは、ロボット操作のためのより良い視覚世界モデル、つまり過去の … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Latent Action Pretraining from Videos

投稿日: 2025年5月16日作成者: jarxiv

要約 General Action Models（LAPA）の潜在的なアクション … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

On the Interplay of Human-AI Alignment,Fairness, and Performance Trade-offs in Medical Imaging

投稿日: 2025年5月16日作成者: jarxiv

要約深いニューラルネットワークは医療イメージングに優れていますが、バイアスの傾 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Inferring Driving Maps by Deep Learning-based Trail Map Extraction

投稿日: 2025年5月16日作成者: jarxiv

要約高解像度（HD）マップは、運転シーンに関する広範かつ正確な環境情報を提供し … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

HandReader: Advanced Techniques for Efficient Fingerspelling Recognition

投稿日: 2025年5月16日作成者: jarxiv

要約指の貫通は、手話（SL）の重要な要素であり、署名中の高速手の動きを特徴とす … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Discriminating image representations with principal distortions

GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing

QVGen: Pushing the Limit of Quantized Video Generative Models

UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations

Large-Scale Gaussian Splatting SLAM

FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation

Latent Action Pretraining from Videos

On the Interplay of Human-AI Alignment,Fairness, and Performance Trade-offs in Medical Imaging

Inferring Driving Maps by Deep Learning-based Trail Map Extraction

HandReader: Advanced Techniques for Efficient Fingerspelling Recognition

最近の投稿

最近のコメント

アーカイブ

カテゴリー