月別アーカイブ: 2023年5月

CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth

投稿日: 2023年5月22日作成者: jarxiv

要約この研究では、軽量で密結合された深層ネットワークと視覚慣性オドメトリ (V … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots

投稿日: 2023年5月22日作成者: jarxiv

要約拡散モデルは、テキストから画像への合成において目覚ましい進歩を遂げました。 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Object-centric and memory-guided normality reconstruction for video anomaly detection

投稿日: 2023年5月22日作成者: jarxiv

要約このペーパーでは、ビデオ監視におけるビデオ異常検出の問題について説明します … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Is GPT-3 all you need for Visual Question Answering in Cultural Heritage?

投稿日: 2023年5月22日作成者: jarxiv

要約文化遺産分野におけるディープラーニングとコンピュータービジョンの使用は、音 … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Brain Captioning: Decoding human brain activity into images and text

投稿日: 2023年5月22日作成者: jarxiv

要約人間の脳は毎日、膨大な量の視覚情報を処理し、複雑な神経機構に依存してこれら … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

投稿日: 2023年5月22日作成者: jarxiv

要約音声は人間が外界と通信するためのシンプルかつ効果的な方法ですが、より現実的 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.SD, eess.AS | コメントを受け付けていません

StereoVAE: A lightweight stereo matching system through embedded GPUs

投稿日: 2023年5月22日作成者: jarxiv

要約組み込み GPU を使用したステレオマッチングのための軽量システムを紹介 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM, cs.RO | コメントを受け付けていません

A Unified Prompt-Guided In-Context Inpainting Framework for Reference-based Image Manipulations

投稿日: 2023年5月22日作成者: jarxiv

要約 Text-to-Image (T2I) 生成モデルの最近の進歩により、一貫 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

What You Hear Is What You See: Audio Quality Metrics From Image Quality Metrics

投稿日: 2023年5月22日作成者: jarxiv

要約この研究では、オーディオ信号をスペクトログラムとして表すことにより、オーデ … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields

投稿日: 2023年5月22日作成者: jarxiv

要約テキスト駆動の 3D シーン生成は、3D シーンの需要が大きいビデオゲー … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

月別アーカイブ: 2023年5月

CodeVIO: Visual-Inertial Odometry with Learned Optimizable Dense Depth

Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots

Object-centric and memory-guided normality reconstruction for video anomaly detection

Is GPT-3 all you need for Visual Question Answering in Cultural Heritage?

Brain Captioning: Decoding human brain activity into images and text

VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

StereoVAE: A lightweight stereo matching system through embedded GPUs

A Unified Prompt-Guided In-Context Inpainting Framework for Reference-based Image Manipulations

What You Hear Is What You See: Audio Quality Metrics From Image Quality Metrics

Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields

最近の投稿

最近のコメント

アーカイブ

カテゴリー