「cs.MM」カテゴリーアーカイブ

You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos

投稿日: 2023年3月15日作成者: jarxiv

要約トリミングされていないビデオが与えられた場合、テンポラルセンテンスグラ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM | コメントを受け付けていません

SuS-X: Training-Free Name-Only Transfer of Vision-Language Models

投稿日: 2023年3月15日作成者: jarxiv

要約 Contrastive Language-Image Pre-traini … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.MM | コメントを受け付けていません

Temporal Sentence Grounding in Videos: A Survey and Future Directions

投稿日: 2023年3月14日作成者: jarxiv

要約ビデオにおける時間的文のグラウンディング (TSGV)、別名自然言語ビデオ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.MM | コメントを受け付けていません

PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents

投稿日: 2023年3月14日作成者: jarxiv

要約大規模なデータセットでトレーニングされた Foundation モデルは、 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG, cs.MM | コメントを受け付けていません

TriDet: Temporal Action Detection with Relative Boundary Modeling

投稿日: 2023年3月14日作成者: jarxiv

要約この論文では、一時的なアクション検出のための 1 段階のフレームワーク T … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM | コメントを受け付けていません

MuLTI: Efficient Video-and-Language Understanding with MultiWay-Sampler and Multiple Choice Modeling

投稿日: 2023年3月13日作成者: jarxiv

要約ビデオと言語の理解には、ビデオによる質問への回答、テキストとビデオの検索、 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.MM | コメントを受け付けていません

QVRF: A Quantization-error-aware Variable Rate Framework for Learned Image Compression

投稿日: 2023年3月13日作成者: jarxiv

要約学習した画像圧縮は有望な圧縮性能を示していますが、広範囲にわたる可変ビット … 続きを読む →

カテゴリー: cs.AI, cs.MM, eess.IV | コメントを受け付けていません

BIRD-PCC: Bi-directional Range Image-based Deep LiDAR Point Cloud Compression

投稿日: 2023年3月10日作成者: jarxiv

要約 LiDAR センサーによって収集される大量のデータは、LiDAR ポイント … 続きを読む →

カテゴリー: cs.MM, cs.RO | コメントを受け付けていません

Video Question Answering Using CLIP-Guided Visual-Text Attention

投稿日: 2023年3月9日作成者: jarxiv

要約ビデオとテキストのクロスモーダル学習は、ビデオ質問応答 (VideoQA) … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM, I.2.10 | コメントを受け付けていません

CaDM: Codec-aware Diffusion Modeling for Neural-enhanced Video Streaming

投稿日: 2023年3月9日作成者: jarxiv

要約近年、ストリーマーのアップリンク帯域幅に合わせてビデオビットストリームが … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.MM, eess.IV | コメントを受け付けていません

「cs.MM」カテゴリーアーカイブ

You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos

SuS-X: Training-Free Name-Only Transfer of Vision-Language Models

Temporal Sentence Grounding in Videos: A Survey and Future Directions

PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents

TriDet: Temporal Action Detection with Relative Boundary Modeling

MuLTI: Efficient Video-and-Language Understanding with MultiWay-Sampler and Multiple Choice Modeling

QVRF: A Quantization-error-aware Variable Rate Framework for Learned Image Compression

BIRD-PCC: Bi-directional Range Image-based Deep LiDAR Point Cloud Compression

Video Question Answering Using CLIP-Guided Visual-Text Attention

CaDM: Codec-aware Diffusion Modeling for Neural-enhanced Video Streaming

最近の投稿

最近のコメント

アーカイブ

カテゴリー