「cs.CV」カテゴリーアーカイブ

Fashion-VDM: Video Diffusion Model for Virtual Try-On

投稿日: 2024年11月5日作成者: jarxiv

要約我々は、バーチャル試着ビデオを生成するためのビデオ拡散モデル（VDM）であ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs

投稿日: 2024年11月5日作成者: jarxiv

要約聴覚音声認識、視覚音声認識、視聴覚音声認識（それぞれASR、VSR、AVS … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

xMIL: Insightful Explanations for Multiple Instance Learning in Histopathology

投稿日: 2024年11月5日作成者: jarxiv

要約複数インスタンス学習（Multiple instance learning … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Conformal-in-the-Loop for Learning with Imbalanced Noisy Data

投稿日: 2024年11月5日作成者: jarxiv

要約クラス不均衡とラベルノイズは大規模データセットに蔓延しているが、機械学習研 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation

投稿日: 2024年11月5日作成者: jarxiv

要約 3D生成モデルはアーティストのワークフローを大きく改善したが、3D生成のた … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Grouped Discrete Representation for Object-Centric Learning

投稿日: 2024年11月5日作成者: jarxiv

要約オブジェクト中心学習（OCL：Object-Centric Learnin … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)

投稿日: 2024年11月5日作成者: jarxiv

要約 CLIPエンベッディングは、様々なマルチモーダルアプリケーションにおいて顕 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Taxonomy-Aware Continual Semantic Segmentation in Hyperbolic Spaces for Open-World Perception

投稿日: 2024年11月5日作成者: jarxiv

要約意味セグメンテーションモデルは通常、固定されたクラスセットで学習されるため … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.RO | コメントを受け付けていません

GenXD: Generating Any 3D and 4D Scenes

投稿日: 2024年11月5日作成者: jarxiv

要約近年の2D映像生成の発展は目覚ましい。しかし、3Dや4Dの生成は、大規模な … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance

投稿日: 2024年11月5日作成者: jarxiv

要約この1年で、ビデオベースの大規模言語モデルが大きく進歩した。しかし、短い動 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Fashion-VDM: Video Diffusion Model for Virtual Try-On

Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs

xMIL: Insightful Explanations for Multiple Instance Learning in Histopathology

Conformal-in-the-Loop for Learning with Imbalanced Noisy Data

Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation

Grouped Discrete Representation for Object-Centric Learning

Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)

Taxonomy-Aware Continual Semantic Segmentation in Hyperbolic Spaces for Open-World Perception

GenXD: Generating Any 3D and 4D Scenes

PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance

最近の投稿

最近のコメント

アーカイブ

カテゴリー