「cs.CV」カテゴリーアーカイブ

VidTwin: Video VAE with Decoupled Structure and Dynamics

投稿日: 2024年12月24日作成者: jarxiv

要約ビデオオートエンコーダ (ビデオ AE) の最近の進歩により、ビデオ生成 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Mimicking-Bench: A Benchmark for Generalizable Humanoid-Scene Interaction Learning via Human Mimicking

投稿日: 2024年12月24日作成者: jarxiv

要約人間のデータを模倣して 3D シーンと対話するヒューマノイドロボットの一 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions

投稿日: 2024年12月24日作成者: jarxiv

要約 3D で自然な手とオブジェクトのインタラクションを生成することは、結果とし … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR, cs.LG | コメントを受け付けていません

Reasoning to Attend: Try to Understand How Token Works

投稿日: 2024年12月24日作成者: jarxiv

要約現在の大規模マルチモーダルモデル (LMM) で強化された視覚的グラウン … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy

投稿日: 2024年12月24日作成者: jarxiv

要約人工知能の急速に進化している分野であるマルチモーダル学習は、テキスト、画像 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

ActiveGS: Active Scene Reconstruction using Gaussian Splatting

投稿日: 2024年12月24日作成者: jarxiv

要約ロボット工学アプリケーションは多くの場合、下流のタスクを可能にするためにシ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective

投稿日: 2024年12月24日作成者: jarxiv

要約最近の Large Vision-Language Model (LVLM … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection

投稿日: 2024年12月24日作成者: jarxiv

要約モデルが広大なオープンワールドカテゴリを認識できるようにすることは、物体 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

GauSim: Registering Elastic Objects into Digital World by Gaussian Simulator

投稿日: 2024年12月24日作成者: jarxiv

要約この研究では、ガウスカーネルを通じて表現される現実世界の弾性オブジェクト … 続きを読む →

カテゴリー: cs.CV, cs.GR | コメントを受け付けていません

Large Motion Video Autoencoding with Cross-modal Video VAE

投稿日: 2024年12月24日作成者: jarxiv

要約ビデオの冗長性を減らし、効率的なビデオ生成を促進するには、堅牢なビデオ変分 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

「cs.CV」カテゴリーアーカイブ

VidTwin: Video VAE with Decoupled Structure and Dynamics

Mimicking-Bench: A Benchmark for Generalizable Humanoid-Scene Interaction Learning via Human Mimicking

DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions

Reasoning to Attend: Try to Understand How Token Works

Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy

ActiveGS: Active Scene Reconstruction using Gaussian Splatting

Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective

Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection

GauSim: Registering Elastic Objects into Digital World by Gaussian Simulator

Large Motion Video Autoencoding with Cross-modal Video VAE

最近の投稿

最近のコメント

アーカイブ

カテゴリー