「cs.AI」カテゴリーアーカイブ

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

投稿日: 2024年3月25日作成者: jarxiv

要約大規模マルチモーダルモデル (LMM) は、ビジュアルエンコーダーと大 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion

投稿日: 2024年3月25日作成者: jarxiv

要約ローカライズされたセマンティック編集のためのトレーニング不要のビデオ編集ア … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Unimodal Multi-Task Fusion for Emotional Mimicry Prediction

投稿日: 2024年3月25日作成者: jarxiv

要約この研究では、第 6 回ワークショップおよび野外での感情行動分析に関するコ … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks

投稿日: 2024年3月25日作成者: jarxiv

要約ビデオ間の編集には、ソースビデオを追加のコントロール (テキストプロン … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM | コメントを受け付けていません

Knowledge-Enhanced Recommendation with User-Centric Subgraph Network

投稿日: 2024年3月24日作成者: jarxiv

要約レコメンデーションシステムは、現在さまざまなプラットフォームで広く実装さ … 続きを読む →

カテゴリー: cs.AI, cs.IR, cs.LG | コメントを受け付けていません

CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds

投稿日: 2024年3月24日作成者: jarxiv

要約この論文では、乳児の泣き声にラベルを付けたコレクションである Ubenwa … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.SD, eess.AS | コメントを受け付けていません

TD-MPC2: Scalable, Robust World Models for Continuous Control

投稿日: 2024年3月22日作成者: jarxiv

要約 TD-MPC は、学習された暗黙的 (デコーダーなし) ワールドモデルの … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections

投稿日: 2024年3月22日作成者: jarxiv

要約今日のロボットポリシーは、新しい環境に一般化するという課題に直面すると、 … 続きを読む →

カテゴリー: cs.AI, cs.LG, cs.RO | コメントを受け付けていません

Leveraging Large Language Model-based Room-Object Relationships Knowledge for Enhancing Multimodal-Input Object Goal Navigation

投稿日: 2024年3月22日作成者: jarxiv

要約オブジェクトとゴールのナビゲーションは、身体的ナビゲーションのコミュニティ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.RO | コメントを受け付けていません

SLIM: Skill Learning with Multiple Critics

投稿日: 2024年3月22日作成者: jarxiv

要約自己監視型スキル学習は、環境の根底にある力学を活用する有用な行動を獲得する … 続きを読む →

カテゴリー: cs.AI, cs.LG, cs.RO | コメントを受け付けていません

「cs.AI」カテゴリーアーカイブ

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion

Unimodal Multi-Task Fusion for Emotional Mimicry Prediction

AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks

Knowledge-Enhanced Recommendation with User-Centric Subgraph Network

CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds

TD-MPC2: Scalable, Robust World Models for Continuous Control

Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections

Leveraging Large Language Model-based Room-Object Relationships Knowledge for Enhancing Multimodal-Input Object Goal Navigation

SLIM: Skill Learning with Multiple Critics

最近の投稿

最近のコメント

アーカイブ

カテゴリー