「cs.CV」カテゴリーアーカイブ

Synthesizing Environment-Specific People in Photographs

投稿日: 2024年9月27日作成者: jarxiv

要約我々は、入力写真に描かれたシーンに意味的に適切な服を着た人物のフォトリアリ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Transferring disentangled representations: bridging the gap between synthetic and real images

投稿日: 2024年9月27日作成者: jarxiv

要約データ生成メカニズムの基本構造を分離する、意味のある効率的な表現を開発する … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

ReliOcc: Towards Reliable Semantic Occupancy Prediction via Uncertainty Learning

投稿日: 2024年9月27日作成者: jarxiv

要約視覚中心のセマンティック占有予測は自動運転において重要な役割を果たしており … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Exploring Event-based Human Pose Estimation with 3D Event Representations

投稿日: 2024年9月27日作成者: jarxiv

要約人間の姿勢推定は、コンピュータービジョンにおける基本的かつ魅力的なタスク … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.RO, eess.IV | コメントを受け付けていません

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

投稿日: 2024年9月27日作成者: jarxiv

要約 GPT-4o は、多様な感情やトーンの音声会話を可能にするオムニモーダル … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning

投稿日: 2024年9月27日作成者: jarxiv

要約画像キャプションの最近の進歩により、画像とテキストのペアのデータの制限を克 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Revisit Anything: Visual Place Recognition via Image Segment Retrieval

投稿日: 2024年9月27日作成者: jarxiv

要約再訪問した場所を正確に認識することは、実体化したエージェントが位置を特定し … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.IR, cs.LG, cs.RO | コメントを受け付けていません

Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers

投稿日: 2024年9月27日作成者: jarxiv

要約 3D Large Language Model (LLM) の最近の進歩に … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Visual Data Diagnosis and Debiasing with Concept Graphs

投稿日: 2024年9月27日作成者: jarxiv

要約今日の深層学習モデルの広範な成功は、サイズと複雑さが大幅に異なる広範なデー … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

LightAvatar: Efficient Head Avatar as Dynamic Neural Light Field

投稿日: 2024年9月27日作成者: jarxiv

要約最近の研究では、パラメトリックモデル上の神経放射フィールド (NeRF) … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

Synthesizing Environment-Specific People in Photographs

Transferring disentangled representations: bridging the gap between synthetic and real images

ReliOcc: Towards Reliable Semantic Occupancy Prediction via Uncertainty Learning

Exploring Event-based Human Pose Estimation with 3D Event Representations

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning

Revisit Anything: Visual Place Recognition via Image Segment Retrieval

Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers

Visual Data Diagnosis and Debiasing with Concept Graphs

LightAvatar: Efficient Head Avatar as Dynamic Neural Light Field

最近の投稿

最近のコメント

アーカイブ

カテゴリー