「cs.CV」カテゴリーアーカイブ

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

投稿日: 2024年11月20日作成者: jarxiv

要約マルチモーダル大規模言語モデル (MLLM) の進歩により、マルチモーダル … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

AdaCM$^2$: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction

投稿日: 2024年11月20日作成者: jarxiv

要約大規模言語モデル (LLM) の進歩により、LLM をビジュアルモデルに … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Barttender: An approachable & interpretable way to compare medical imaging and non-imaging data

投稿日: 2024年11月20日作成者: jarxiv

要約画像ベースの深層学習は医療研究に変革をもたらしましたが、画像モデルと従来の … 続きを読む →

カテゴリー: cs.CV, q-bio.QM | コメントを受け付けていません

CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs

投稿日: 2024年11月20日作成者: jarxiv

要約 Large Vision-Language Model (LVLM) シス … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Heuristic-Free Multi-Teacher Learning

投稿日: 2024年11月20日作成者: jarxiv

要約手動による集計ヒューリスティックの必要性を排除する、複数教師による学習のた … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Generative World Explorer

投稿日: 2024年11月20日作成者: jarxiv

要約部分的な観察を伴う計画は、身体化 AI における中心的な課題です。これま … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Leveraging Computational Pathology AI for Noninvasive Optical Imaging Analysis Without Retraining

投稿日: 2024年11月20日作成者: jarxiv

要約非侵襲的な光学イメージングモダリティは、患者の組織を 3D でプローブし、 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Cascaded Diffusion Models for 2D and 3D Microscopy Image Synthesis to Enhance Cell Segmentation

投稿日: 2024年11月20日作成者: jarxiv

要約顕微鏡画像における自動細胞セグメンテーションは生物医学研究には不可欠ですが … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment

投稿日: 2024年11月20日作成者: jarxiv

要約大規模言語モデル (LLM) と事前トレーニング済み視覚モデルの最近の進歩 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

EROAM: Event-based Camera Rotational Odometry and Mapping in Real-time

投稿日: 2024年11月19日作成者: jarxiv

要約この論文では、リアルタイムで正確なカメラ回転推定を実現する新しいイベントベ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

AdaCM$^2$: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction

Barttender: An approachable & interpretable way to compare medical imaging and non-imaging data

CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs

Heuristic-Free Multi-Teacher Learning

Generative World Explorer

Leveraging Computational Pathology AI for Noninvasive Optical Imaging Analysis Without Retraining

Cascaded Diffusion Models for 2D and 3D Microscopy Image Synthesis to Enhance Cell Segmentation

Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment

EROAM: Event-based Camera Rotational Odometry and Mapping in Real-time

最近の投稿

最近のコメント

アーカイブ

カテゴリー