投稿者「jarxiv」のアーカイブ

Vision Language Models as Values Detectors

投稿日: 2025年1月8日作成者: jarxiv

要約テキスト入力とビジュアル入力を統合した大規模言語モデルにより、複雑なデータ … 続きを読む →

カテゴリー: cs.CV, cs.HC | コメントを受け付けていません

Temporal Feature Weaving for Neonatal Echocardiographic Viewpoint Video Classification

投稿日: 2025年1月8日作成者: jarxiv

要約心エコー図における自動視点分類は、専門技術者が不在の場合に、リソースが不足 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

VLM-driven Behavior Tree for Context-aware Task Planning

投稿日: 2025年1月8日作成者: jarxiv

要約ビヘイビアツリー (BT) を生成するための大規模言語モデル (LLM) … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.HC, cs.RO | コメントを受け付けていません

ImageFlowNet: Forecasting Multiscale Image-Level Trajectories of Disease Progression with Irregularly-Sampled Longitudinal Medical Images

投稿日: 2025年1月8日作成者: jarxiv

要約医療画像技術の進歩により、同じ患者を長期間にわたって繰り返しスキャンして疾 … 続きを読む →

カテゴリー: cs.CV, cs.LG, eess.IV | コメントを受け付けていません

NeuralSVG: An Implicit Representation for Text-to-Vector Generation

投稿日: 2025年1月8日作成者: jarxiv

要約ベクターグラフィックスはデザインに不可欠であり、解像度に依存せず、高度に … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance

投稿日: 2025年1月8日作成者: jarxiv

要約検索拡張生成 (RAG) は、外部知識を使用して応答生成をガイドすることで … 続きを読む →

カテゴリー: cs.CV, cs.IR, cs.IT, cs.LG, math.IT | コメントを受け付けていません

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

投稿日: 2025年1月8日作成者: jarxiv

要約この作品では、画像とビデオの両方をしっかりと根拠に基づいて理解するための初 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Extraction Of Cumulative Blobs From Dynamic Gestures

投稿日: 2025年1月8日作成者: jarxiv

要約ジェスチャ認識は、コンピューターが人間の動きをコマンドとして解釈できるよう … 続きを読む →

カテゴリー: 68T45, 68U10, cs.CV, H.5.2 | コメントを受け付けていません

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

投稿日: 2025年1月8日作成者: jarxiv

要約視覚言語モデル (VLM) の最近の進歩により、自動運転への使用、特に自然 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes

投稿日: 2025年1月8日作成者: jarxiv

要約 LiDAR データの事前トレーニングは、大規模ですぐに利用できるデータセッ … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.RO | コメントを受け付けていません

投稿者「jarxiv」のアーカイブ

Vision Language Models as Values Detectors

Temporal Feature Weaving for Neonatal Echocardiographic Viewpoint Video Classification

VLM-driven Behavior Tree for Context-aware Task Planning

ImageFlowNet: Forecasting Multiscale Image-Level Trajectories of Disease Progression with Irregularly-Sampled Longitudinal Medical Images

NeuralSVG: An Implicit Representation for Text-to-Vector Generation

RAG-Check: Evaluating Multimodal Retrieval Augmented Generation Performance

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Extraction Of Cumulative Blobs From Dynamic Gestures

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes

最近の投稿

最近のコメント

アーカイブ

カテゴリー