月別アーカイブ: 2024年7月

TARGO: Benchmarking Target-driven Object Grasping under Occlusions

投稿日: 2024年7月9日作成者: jarxiv

要約単一の深度画像から 6D 把握ポーズを予測する最近の進歩により、ロボットに … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Potential Based Diffusion Motion Planning

投稿日: 2024年7月9日作成者: jarxiv

要約高次元空間における効果的な動作計画は、ロボット工学における長年の未解決の問 … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.RO | コメントを受け付けていません

The Tug-of-War Between Deepfake Generation and Detection

投稿日: 2024年7月9日作成者: jarxiv

要約マルチモーダル生成モデルは急速に進化しており、リアルなビデオやオーディオの … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Vision-Language Models under Cultural and Inclusive Considerations

投稿日: 2024年7月9日作成者: jarxiv

要約大規模視覚言語モデル (VLM) は、視覚障害のある人々の日常生活の画像を … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.CY | コメントを受け付けていません

Transfer Learning with Self-Supervised Vision Transformers for Snake Identification

投稿日: 2024年7月9日作成者: jarxiv

要約画像からヘビの種類を予測する SnakeCLEF 2024 コンテストのア … 続きを読む →

カテゴリー: cs.CV, cs.IR, cs.LG | コメントを受け付けていません

JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation

投稿日: 2024年7月9日作成者: jarxiv

要約パーソナライズされたテキストから画像への生成モデルにより、ユーザーはさまざ … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation

投稿日: 2024年7月9日作成者: jarxiv

要約群集モーション生成は、アニメやゲームなどのエンターテインメント業界だけでな … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

投稿日: 2024年7月9日作成者: jarxiv

要約 Large Vision Language Model (LVLM) のパ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

4D Contrastive Superflows are Dense 3D Representation Learners

投稿日: 2024年7月9日作成者: jarxiv

要約自動運転の分野では、正確な 3D 認識が基礎となります。ただし、このよう … 続きを読む →

カテゴリー: cs.CV, cs.LG, cs.RO | コメントを受け付けていません

Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images

投稿日: 2024年7月9日作成者: jarxiv

要約 3D AIGC の最近の進歩により、テキストや画像から 3D オブジェクト … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2024年7月

TARGO: Benchmarking Target-driven Object Grasping under Occlusions

Potential Based Diffusion Motion Planning

The Tug-of-War Between Deepfake Generation and Detection

Vision-Language Models under Cultural and Inclusive Considerations

Transfer Learning with Self-Supervised Vision Transformers for Snake Identification

JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation

CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

4D Contrastive Superflows are Dense 3D Representation Learners

Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images

最近の投稿

最近のコメント

アーカイブ

カテゴリー