月別アーカイブ: 2023年1月

Learning Multimodal Data Augmentation in Feature Space

投稿日: 2023年1月2日作成者: jarxiv

要約テキスト、オーディオ、ビジュアルデータなどの複数のモダリティから共同で学 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Learning 3D Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton Formats

投稿日: 2023年1月2日作成者: jarxiv

要約深層学習ベースの 3D 人間の姿勢推定は、ラベル付けされた大量のデータでト … 続きを読む →

カテゴリー: cs.CV, I.2.10 | コメントを受け付けていません

Improving Visual Representation Learning through Perceptual Understanding

投稿日: 2023年1月2日作成者: jarxiv

要約より高いシーンレベルの機能の学習を明示的に奨励することにより、モデルによっ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

IDET: Iterative Difference-Enhanced Transformers for High-Quality Change Detection

投稿日: 2023年1月2日作成者: jarxiv

要約変化検出 (CD) は、異なる時間にキャプチャされた画像ペア内の変化領域を … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Predictive Coding Based Multiscale Network with Encoder-Decoder LSTM for Video Prediction

投稿日: 2023年1月2日作成者: jarxiv

要約ここでは、ビデオ予測用のマルチスケール予測モデルを紹介します。その設計は、 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning

投稿日: 2023年1月2日作成者: jarxiv

要約リモートセンシング画像は地球の包括的なビューを提供し、さまざまなセンサー … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training

投稿日: 2023年1月2日作成者: jarxiv

要約ビデオ言語の事前トレーニングにより、下流のさまざまなビデオ言語タスクのパフ … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.MM | コメントを受け付けていません

An Experience-based Direct Generation approach to Automatic Image Cropping

投稿日: 2023年1月2日作成者: jarxiv

要約自動画像クロッピングは、多くの実用的なダウンストリームアプリケーションで … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

A Fine-Grained Vehicle Detection (FGVD) Dataset for Unconstrained Roads

投稿日: 2023年1月2日作成者: jarxiv

要約以前のきめの細かいデータセットは主に分類に焦点を当てており、多くの場合、オ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

NIRVANA: Neural Implicit Representations of Videos with Adaptive Networks and Autoregressive Patch-wise Modeling

投稿日: 2023年1月2日作成者: jarxiv

要約 Implicit Neural Representations (INR) … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2023年1月

Learning Multimodal Data Augmentation in Feature Space

Learning 3D Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton Formats

Improving Visual Representation Learning through Perceptual Understanding

IDET: Iterative Difference-Enhanced Transformers for High-Quality Change Detection

Predictive Coding Based Multiscale Network with Encoder-Decoder LSTM for Video Prediction

Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning

HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training

An Experience-based Direct Generation approach to Automatic Image Cropping

A Fine-Grained Vehicle Detection (FGVD) Dataset for Unconstrained Roads

NIRVANA: Neural Implicit Representations of Videos with Adaptive Networks and Autoregressive Patch-wise Modeling

最近の投稿

最近のコメント

アーカイブ

カテゴリー