月別アーカイブ: 2024年9月

InterNet: Unsupervised Cross-modal Homography Estimation Based on Interleaved Modality Transfer and Self-supervised Homography Prediction

投稿日: 2024年9月27日作成者: jarxiv

要約我々は、インターリーブモダリティ転送と自己教師付きホモグラフィー予測に基づ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

PhoCoLens: Photorealistic and Consistent Reconstruction in Lensless Imaging

投稿日: 2024年9月27日作成者: jarxiv

要約レンズレスカメラは、従来のレンズベースのシステムと比較して、サイズ、重量、 … 続きを読む →

カテゴリー: cs.CV, cs.LG, eess.IV | コメントを受け付けていません

Disentangled Clothed Avatar Generation from Text Descriptions

投稿日: 2024年9月27日作成者: jarxiv

要約本稿では、人体と衣服を別々に生成し、生成されたアバター上で高品質なアニメー … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Valeo4Cast: A Modular Approach to End-to-End Forecasting

投稿日: 2024年9月27日作成者: jarxiv

要約動き予測は、歩行者、車両、信号機などの周囲のエージェントの将来の軌道を予測 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Synthesizing Environment-Specific People in Photographs

投稿日: 2024年9月27日作成者: jarxiv

要約我々は、入力写真に描かれたシーンに意味的に適切な服を着た人物のフォトリアリ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Transferring disentangled representations: bridging the gap between synthetic and real images

投稿日: 2024年9月27日作成者: jarxiv

要約データ生成メカニズムの基本構造を分離する、意味のある効率的な表現を開発する … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

ReliOcc: Towards Reliable Semantic Occupancy Prediction via Uncertainty Learning

投稿日: 2024年9月27日作成者: jarxiv

要約視覚中心のセマンティック占有予測は自動運転において重要な役割を果たしており … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Exploring Event-based Human Pose Estimation with 3D Event Representations

投稿日: 2024年9月27日作成者: jarxiv

要約人間の姿勢推定は、コンピュータービジョンにおける基本的かつ魅力的なタスク … 続きを読む →

カテゴリー: cs.CV, cs.MM, cs.RO, eess.IV | コメントを受け付けていません

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

投稿日: 2024年9月27日作成者: jarxiv

要約 GPT-4o は、多様な感情やトーンの音声会話を可能にするオムニモーダル … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning

投稿日: 2024年9月27日作成者: jarxiv

要約画像キャプションの最近の進歩により、画像とテキストのペアのデータの制限を克 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

月別アーカイブ: 2024年9月

InterNet: Unsupervised Cross-modal Homography Estimation Based on Interleaved Modality Transfer and Self-supervised Homography Prediction

PhoCoLens: Photorealistic and Consistent Reconstruction in Lensless Imaging

Disentangled Clothed Avatar Generation from Text Descriptions

Valeo4Cast: A Modular Approach to End-to-End Forecasting

Synthesizing Environment-Specific People in Photographs

Transferring disentangled representations: bridging the gap between synthetic and real images

ReliOcc: Towards Reliable Semantic Occupancy Prediction via Uncertainty Learning

Exploring Event-based Human Pose Estimation with 3D Event Representations

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning

最近の投稿

最近のコメント

アーカイブ

カテゴリー