月別アーカイブ: 2025年3月

M2N2V2: Multi-Modal Unsupervised and Training-free Interactive Segmentation

投稿日: 2025年3月21日作成者: jarxiv

要約 Markov Map Beost Neighbor（M2N2V2）を提示し … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Benchmarking Large Language Models for Handwritten Text Recognition

投稿日: 2025年3月21日作成者: jarxiv

要約手書きのテキスト認識（HTR）の従来の機械学習モデル（HTR）は、監督され … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Vision-Language Models Generate More Homogeneous Stories for Phenotypically Black Individuals

投稿日: 2025年3月21日作成者: jarxiv

要約 Vision-Language Models（VLMS）は、画像処理を統合 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models

投稿日: 2025年3月21日作成者: jarxiv

要約ビデオ大規模な言語モデル（Videollms）は、より長いビデオ入力を処理 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data

投稿日: 2025年3月21日作成者: jarxiv

要約視覚的推論は、マルチモーダルの大手言語モデル（MLLM）にとって複雑なチャ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition

投稿日: 2025年3月21日作成者: jarxiv

要約スケルトンベースのアクション認識では、重要な課題は、骨格表現に画像レベルの … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

From Monocular Vision to Autonomous Action: Guiding Tumor Resection via 3D Reconstruction

投稿日: 2025年3月21日作成者: jarxiv

要約外科的自動化には、正確なガイダンスとシーンの理解が必要です。文献の現在の … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Do image and video quality metrics model low-level human vision?

投稿日: 2025年3月21日作成者: jarxiv

要約 SSIM、LPIPS、VMAFなどの画像およびビデオの品質メトリックは、評 … 続きを読む →

カテゴリー: cs.CV, cs.MM, eess.IV | コメントを受け付けていません

Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

投稿日: 2025年3月21日作成者: jarxiv

要約一般化された少数のショット3Dポイントクラウドセグメンテーション（GFS- … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

PSA-MIL: A Probabilistic Spatial Attention-Based Multiple Instance Learning for Whole Slide Image Classification

投稿日: 2025年3月21日作成者: jarxiv

要約全体のスライド画像（WSI）は、医療診断で広く使用されている高解像度のデジ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2025年3月

M2N2V2: Multi-Modal Unsupervised and Training-free Interactive Segmentation

Benchmarking Large Language Models for Handwritten Text Recognition

Vision-Language Models Generate More Homogeneous Stories for Phenotypically Black Individuals

Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models

Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data

Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition

From Monocular Vision to Autonomous Action: Guiding Tumor Resection via 3D Reconstruction

Do image and video quality metrics model low-level human vision?

Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

PSA-MIL: A Probabilistic Spatial Attention-Based Multiple Instance Learning for Whole Slide Image Classification

最近の投稿

最近のコメント

アーカイブ

カテゴリー