「cs.AI」カテゴリーアーカイブ

Neptune: The Long Orbit to Benchmarking Long Video Understanding

投稿日: 2024年12月13日作成者: jarxiv

要約このペーパーでは、長いビデオを理解するための難しい質問、回答、おとりのセッ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

投稿日: 2024年12月13日作成者: jarxiv

要約人間の認知と同様に、長期間にわたって環境と対話できる AI システムを作成 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Owl-1: Omni World Model for Consistent Long Video Generation

投稿日: 2024年12月13日作成者: jarxiv

要約ビデオ生成モデル (VGM) は最近大きな注目を集めており、汎用大型ビジョ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

TimeRefine: Temporal Grounding with Time Refining Video LLM

投稿日: 2024年12月13日作成者: jarxiv

要約ビデオの時間的グラウンディングは、テキストのプロンプトが与えられたビデオ内 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Hidden Biases of End-to-End Driving Datasets

投稿日: 2024年12月13日作成者: jarxiv

要約エンドツーエンドの駆動システムは急速に進歩していますが、これまでのところ、 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

Olympus: A Universal Task Router for Computer Vision Tasks

投稿日: 2024年12月13日作成者: jarxiv

要約マルチモーダル大規模言語モデル (MLLM) を、さまざまなコンピューター … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Doe-1: Closed-Loop Autonomous Driving with Large World Model

投稿日: 2024年12月13日作成者: jarxiv

要約エンドツーエンドの自動運転は、大量のデータから学習できる可能性があるため、 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Exact Algorithms for Multiagent Path Finding with Communication Constraints on Tree-Like Structures

投稿日: 2024年12月13日作成者: jarxiv

要約複数のエージェントがネットワーク内を最適な方法で移動し、各エージェントが衝 … 続きを読む →

カテゴリー: cs.AI, cs.CC | コメントを受け付けていません

EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech Annotations

投稿日: 2024年12月13日作成者: jarxiv

要約テキスト読み上げ (TTS) テクノロジーの進歩により、生成される音声の品 … 続きを読む →

カテゴリー: cs.AI, cs.SD, eess.AS | コメントを受け付けていません

Annotation-guided Protein Design with Multi-Level Domain Alignment

投稿日: 2024年12月13日作成者: jarxiv

要約 de novo タンパク質設計の中心的な課題は、特定の条件に従って、特定の … 続きを読む →

カテゴリー: cs.AI, cs.LG, q-bio.QM | コメントを受け付けていません

「cs.AI」カテゴリーアーカイブ

Neptune: The Long Orbit to Benchmarking Long Video Understanding

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Owl-1: Omni World Model for Consistent Long Video Generation

TimeRefine: Temporal Grounding with Time Refining Video LLM

Hidden Biases of End-to-End Driving Datasets

Olympus: A Universal Task Router for Computer Vision Tasks

Doe-1: Closed-Loop Autonomous Driving with Large World Model

Exact Algorithms for Multiagent Path Finding with Communication Constraints on Tree-Like Structures

EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech Annotations

Annotation-guided Protein Design with Multi-Level Domain Alignment

最近の投稿

最近のコメント

アーカイブ

カテゴリー