月別アーカイブ: 2024年7月

3D Human Mesh Estimation from Virtual Markers

投稿日: 2024年7月2日作成者: jarxiv

要約体積測定による 3D 姿勢推定の成功に触発されて、一部の最近のヒューマン … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene Parsing

投稿日: 2024年7月2日作成者: jarxiv

要約ディープ畳み込みニューラルネットワークの最近の進歩は、道路シーンの解析の … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding

投稿日: 2024年7月2日作成者: jarxiv

要約 Video Temporal Grounding（VTG）は、言語クエリに … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

A Simple Framework for Open-Vocabulary Zero-Shot Segmentation

投稿日: 2024年7月2日作成者: jarxiv

要約ゼロショット分類機能は、視覚言語の対照的なフレームワーク内でトレーニングさ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

YOLOv10 to Its Genesis: A Decadal and Comprehensive Review of The You Only Look Once Series

投稿日: 2024年7月2日作成者: jarxiv

要約このレビューでは、YOLOv1 から最近発表された YOLOv10 までの … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Training morphological neural networks with gradient descent: some theoretical insights

投稿日: 2024年7月2日作成者: jarxiv

要約形態学的ニューラルネットワーク (レイヤー) は、完全な格子演算子の表現 … 続きを読む →

カテゴリー: cs.CV, cs.LG, stat.ML | コメントを受け付けていません

VIPriors 4: Visual Inductive Priors for Data-Efficient Deep Learning Challenges

投稿日: 2024年7月2日作成者: jarxiv

要約「VIPriors: データ効率の高い深層学習のための視覚的帰納的事前分布 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

E-ANT: A Large-Scale Dataset for Efficient Automatic GUI NavigaTion

投稿日: 2024年7月2日作成者: jarxiv

要約モバイルデバイス上のオンライン GUI ナビゲーションは、多くの実世界の … 続きを読む →

カテゴリー: cs.CV, cs.HC | コメントを受け付けていません

DynamicGlue: Epipolar and Time-Informed Data Association in Dynamic Environments using Graph Neural Networks

投稿日: 2024年7月2日作成者: jarxiv

要約静的環境の想定は、SLAM などの多くの幾何学的なコンピュータービジョン … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

投稿日: 2024年7月2日作成者: jarxiv

要約レイアウト生成は、自動グラフィックデザインを実現するための要であり、さま … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2024年7月

3D Human Mesh Estimation from Virtual Markers

RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene Parsing

VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding

A Simple Framework for Open-Vocabulary Zero-Shot Segmentation

YOLOv10 to Its Genesis: A Decadal and Comprehensive Review of The You Only Look Once Series

Training morphological neural networks with gradient descent: some theoretical insights

VIPriors 4: Visual Inductive Priors for Data-Efficient Deep Learning Challenges

E-ANT: A Large-Scale Dataset for Efficient Automatic GUI NavigaTion

DynamicGlue: Epipolar and Time-Informed Data Association in Dynamic Environments using Graph Neural Networks

PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

最近の投稿

最近のコメント

アーカイブ

カテゴリー