月別アーカイブ: 2025年5月

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs

投稿日: 2025年5月28日作成者: jarxiv

要約論理的推論は、人間の知性の基本的な側面であり、マルチモーダル大手言語モデル … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation

投稿日: 2025年5月28日作成者: jarxiv

要約スーパー解像度（SR）の拡散モデルは、高品質の視覚的結果を生成しますが、高 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios

投稿日: 2025年5月28日作成者: jarxiv

要約マルチモーダル大手言語モデル（MLLM）は、静的画像から光学文字認識（OC … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

HoliTom: Holistic Token Merging for Fast Video Large Language Models

投稿日: 2025年5月28日作成者: jarxiv

要約ビデオ大規模な言語モデル（ビデオLLM）はビデオ理解に優れていますが、冗長 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Structure from Collision

投稿日: 2025年5月28日作成者: jarxiv

要約ニューラル放射輝度フィールド（NERF）や3Dガウススプラッティング（3D … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR, cs.LG, cs.RO | コメントを受け付けていません

Regularized Personalization of Text-to-Image Diffusion Models without Distributional Drift

投稿日: 2025年5月28日作成者: jarxiv

要約テキストから画像への拡散モデルを使用したパーソナライズには、少数の画像例を … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Beyond Accuracy: Uncovering the Role of Similarity Perception and its Alignment with Semantics in Supervised Learning

投稿日: 2025年5月28日作成者: jarxiv

要約類似性は、特に重要なセマンティックな類似性を含むさまざまな形で現れ、例えば … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Cognitive Disentanglement for Referring Multi-Object Tracking

投稿日: 2025年5月28日作成者: jarxiv

要約インテリジェント輸送知覚システムにおけるマルチソース情報融合の重要なアプリ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Prostate Cancer Screening with Artificial Intelligence-Enhanced Micro-Ultrasound: A Comparative Study with Traditional Methods

投稿日: 2025年5月28日作成者: jarxiv

要約背景と目的：Micro-Ultrasound（Micro-US）は、臨床的 … 続きを読む →

カテゴリー: cs.AI, cs.CV, eess.IV | コメントを受け付けていません

AgriFM: A Multi-source Temporal Remote Sensing Foundation Model for Crop Mapping

投稿日: 2025年5月28日作成者: jarxiv

要約正確な作物マッピングは、個々のフィールドテクスチャから景観レベルのコンテキ … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

月別アーカイブ: 2025年5月

MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs

One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation

MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios

HoliTom: Holistic Token Merging for Fast Video Large Language Models

Structure from Collision

Regularized Personalization of Text-to-Image Diffusion Models without Distributional Drift

Beyond Accuracy: Uncovering the Role of Similarity Perception and its Alignment with Semantics in Supervised Learning

Cognitive Disentanglement for Referring Multi-Object Tracking

Prostate Cancer Screening with Artificial Intelligence-Enhanced Micro-Ultrasound: A Comparative Study with Traditional Methods

AgriFM: A Multi-source Temporal Remote Sensing Foundation Model for Crop Mapping

最近の投稿

最近のコメント

アーカイブ

カテゴリー