月別アーカイブ: 2025年3月

L$^2$M: Mutual Information Scaling Law for Long-Context Language Modeling

投稿日: 2025年3月7日作成者: jarxiv

要約私たちは、長距離依存関係を支配する自然言語で法律をスケーリングする二部の相 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.IT, cs.LG, math.IT, physics.data-an | コメントを受け付けていません

A lightweight model FDM-YOLO for small target improvement based on YOLOv8

投稿日: 2025年3月7日作成者: jarxiv

要約小さなターゲットは、ピクセルのカウントが低い、複雑な背景、さまざまな撮影角 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

TPC: Cross-Temporal Prediction Connection for Vision-Language Model Hallucination Reduction

投稿日: 2025年3月7日作成者: jarxiv

要約ビジョン言語モデル（VLM）は、多様なタスク全体の大規模な言語モデル（LL … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Question-Aware Gaussian Experts for Audio-Visual Question Answering

投稿日: 2025年3月7日作成者: jarxiv

要約視聴覚質問応答（AVQA）には、質問に基づいたマルチモーダル推論だけでなく … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Gate-Shift-Pose: Enhancing Action Recognition in Sports with Skeleton Information

投稿日: 2025年3月7日作成者: jarxiv

要約このペーパーでは、RGBフレームと一緒にスケルトンポーズデータを統合するこ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

ForestLPR: LiDAR Place Recognition in Forests Attentioning Multiple BEV Density Images

投稿日: 2025年3月7日作成者: jarxiv

要約場所の認識は、大規模なローカリゼーションシステムのグローバルな一貫性を維持 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

MobileViM: A Light-weight and Dimension-independent Vision Mamba for 3D Medical Image Analysis

投稿日: 2025年3月7日作成者: jarxiv

要約 3次元（3D）医療画像の効率的な評価は、ヘルスケアにおける診断慣行および治 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.NI | コメントを受け付けていません

Semantic Alignment of Unimodal Medical Text and Vision Representations

投稿日: 2025年3月7日作成者: jarxiv

要約一般的なAIモデル、特にテキストとビジョンのために設計されたモデルは、幅広 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Mocap-2-to-3: Lifting 2D Diffusion-Based Pretrained Models for 3D Motion Capture

投稿日: 2025年3月7日作成者: jarxiv

要約単眼のビューから世界座標系で絶対的なポーズを回復することは、重要な課題をも … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving

投稿日: 2025年3月7日作成者: jarxiv

要約自律運転システムの認識と計画能力を高めるために、多様で現実的な運転シナリオ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2025年3月

L$^2$M: Mutual Information Scaling Law for Long-Context Language Modeling

A lightweight model FDM-YOLO for small target improvement based on YOLOv8

TPC: Cross-Temporal Prediction Connection for Vision-Language Model Hallucination Reduction

Question-Aware Gaussian Experts for Audio-Visual Question Answering

Gate-Shift-Pose: Enhancing Action Recognition in Sports with Skeleton Information

ForestLPR: LiDAR Place Recognition in Forests Attentioning Multiple BEV Density Images

MobileViM: A Light-weight and Dimension-independent Vision Mamba for 3D Medical Image Analysis

Semantic Alignment of Unimodal Medical Text and Vision Representations

Mocap-2-to-3: Lifting 2D Diffusion-Based Pretrained Models for 3D Motion Capture

UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving

最近の投稿

最近のコメント

アーカイブ

カテゴリー