「cs.AI」カテゴリーアーカイブ

Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning

投稿日: 2025年1月10日作成者: jarxiv

要約幼児は、言語入力の獲得に先立って、複雑な視覚的理解を急速に発達させます。 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Towards Balanced Continual Multi-Modal Learning in Human Pose Estimation

投稿日: 2025年1月10日作成者: jarxiv

要約 3D 人間姿勢推定 (3D HPE) は、特に RGB ベースの手法の分野 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Geometry Restoration and Dewarping of Camera-Captured Document Images

投稿日: 2025年1月10日作成者: jarxiv

要約この研究は、検出、セグメンテーション、ジオメトリ復元、歪み補正のアルゴリズ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Less is More: The Influence of Pruning on the Explainability of CNNs

投稿日: 2025年1月10日作成者: jarxiv

要約コンピュータービジョンにおける最新の畳み込みニューラルネットワーク ( … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

A Novel Pathology Foundation Model by Mayo Clinic, Charité, and Aignostics

投稿日: 2025年1月10日作成者: jarxiv

要約デジタルパソロジーの最近の進歩により、さまざまなアプリケーションにわたる基 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

AgroGPT: Efficient Agricultural Vision-Language Model with Expert Tuning

投稿日: 2025年1月10日作成者: jarxiv

要約オンラインで入手可能な画像テキストデータの膨大なリポジトリを活用して、大 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Progressive Growing of Video Tokenizers for Highly Compressed Latent Spaces

投稿日: 2025年1月10日作成者: jarxiv

要約ビデオトークナイザーは潜在ビデオ拡散モデルに不可欠であり、生のビデオデ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Consistent Flow Distillation for Text-to-3D Generation

投稿日: 2025年1月10日作成者: jarxiv

要約スコア蒸留サンプリング (SDS) は、3D 生成用の画像生成モデルの蒸留 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

An Empirical Study of Autoregressive Pre-training from Videos

投稿日: 2025年1月10日作成者: jarxiv

要約私たちはビデオからの自己回帰事前トレーニングを実証的に研究しています。研 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

MedCoDi-M: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation

投稿日: 2025年1月10日作成者: jarxiv

要約人工知能は医療現場に革命をもたらし、診断の精度と医療提供を強化しています。 … 続きを読む →

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

「cs.AI」カテゴリーアーカイブ

Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning

Towards Balanced Continual Multi-Modal Learning in Human Pose Estimation

Geometry Restoration and Dewarping of Camera-Captured Document Images

Less is More: The Influence of Pruning on the Explainability of CNNs

A Novel Pathology Foundation Model by Mayo Clinic, Charité, and Aignostics

AgroGPT: Efficient Agricultural Vision-Language Model with Expert Tuning

Progressive Growing of Video Tokenizers for Highly Compressed Latent Spaces

Consistent Flow Distillation for Text-to-3D Generation

An Empirical Study of Autoregressive Pre-training from Videos

MedCoDi-M: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation

最近の投稿

最近のコメント

アーカイブ

カテゴリー