投稿者「jarxiv」のアーカイブ

New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration

投稿日: 2025年6月16日作成者: jarxiv

要約参照式理解（REC）は、言語の理解、イメージの理解、言語から画像への接地の … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation

投稿日: 2025年6月16日作成者: jarxiv

要約反りと侵入の方法論を介して、整列した新規ビューイメージとジオメトリ生成を実 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Real-World Deployment of a Lane Change Prediction Architecture Based on Knowledge Graph Embeddings and Bayesian Inference

投稿日: 2025年6月16日作成者: jarxiv

要約レーン変化予測に関する研究は、ここ数年で多くの勢いを獲得しました。ただし … 続きを読む →

カテゴリー: cs.AI, cs.AR, cs.CV, cs.LG | コメントを受け付けていません

Evaluating Sensitivity Parameters in Smartphone-Based Gaze Estimation: A Comparative Study of Appearance-Based and Infrared Eye Trackers

投稿日: 2025年6月16日作成者: jarxiv

要約この研究では、パフォーマンスを商用赤外線ベースのアイトラッカーであるTob … 続きを読む →

カテゴリー: cs.CV, cs.HC | コメントを受け付けていません

SG2VID: Scene Graphs Enable Fine-Grained Control for Video Synthesis

投稿日: 2025年6月16日作成者: jarxiv

要約外科的シミュレーションは、初心者の外科医の訓練、学習曲線の加速、術中エラー … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Visual Pre-Training on Unlabeled Images using Reinforcement Learning

投稿日: 2025年6月16日作成者: jarxiv

要約強化学習（RL）では、価値ベースのアルゴリズムは、各観察結果を状態に関連付 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

YOLO advances to its genesis: a decadal and comprehensive review of the You Only Look Once (YOLO) series

投稿日: 2025年6月16日作成者: jarxiv

要約このレビューでは、Yolov1から最近発表されたYolov12への1回のみ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

How Visual Representations Map to Language Feature Space in Multimodal LLMs

投稿日: 2025年6月16日作成者: jarxiv

要約効果的なマルチモーダル推論は、視覚表現と言語表現の整合に依存しますが、視覚 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Simple Radiology VLLM Test-time Scaling with Thought Graph Traversal

投稿日: 2025年6月16日作成者: jarxiv

要約テスト時間スケーリングは、追加のトレーニングなしでビジョン言語大規模モデル … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

VGR: Visual Grounded Reasoning

投稿日: 2025年6月16日作成者: jarxiv

要約マルチモーダルの考え方（COT）の推論の分野では、既存のアプローチは主に言 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration

Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation

Real-World Deployment of a Lane Change Prediction Architecture Based on Knowledge Graph Embeddings and Bayesian Inference

Evaluating Sensitivity Parameters in Smartphone-Based Gaze Estimation: A Comparative Study of Appearance-Based and Infrared Eye Trackers

SG2VID: Scene Graphs Enable Fine-Grained Control for Video Synthesis

Visual Pre-Training on Unlabeled Images using Reinforcement Learning

YOLO advances to its genesis: a decadal and comprehensive review of the You Only Look Once (YOLO) series

How Visual Representations Map to Language Feature Space in Multimodal LLMs

Simple Radiology VLLM Test-time Scaling with Thought Graph Traversal

VGR: Visual Grounded Reasoning

最近の投稿

最近のコメント

アーカイブ

カテゴリー