「cs.AI」カテゴリーアーカイブ

Improving Autoregressive Training with Dynamic Oracles

投稿日: 2024年6月14日作成者: jarxiv

要約 NLP 内の多くのタスクは、シーケンスのタグ付けからテキスト生成に至るまで … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.LG | コメントを受け付けていません

Vertical LoRA: Dense Expectation-Maximization Interpretation of Transformers

投稿日: 2024年6月14日作成者: jarxiv

要約この論文では、トランスフォーマーをベイジアンネット上で実行される高密度期 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar

投稿日: 2024年6月14日作成者: jarxiv

要約 3D 占有ベースの認識パイプラインは、詳細なシーンの説明をキャプチャし、さ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.RO | コメントを受け付けていません

PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance

投稿日: 2024年6月14日作成者: jarxiv

要約近年、教育における人工知能技術への注目が高まっていますが、効果的な楽器指導 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition

投稿日: 2024年6月14日作成者: jarxiv

要約 CLIP などの視覚および言語モデル (VLM) は、驚くべきゼロショット … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms

投稿日: 2024年6月14日作成者: jarxiv

要約最新の視覚モデルは、非常に大規模なノイズの多いデータセットでトレーニングさ … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion

投稿日: 2024年6月14日作成者: jarxiv

要約この論文では、2D 拡散モデルの 4D 認識と時空間的一貫性を実現し、高品 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

投稿日: 2024年6月14日作成者: jarxiv

要約 LLM の出現と他のデータモダリティとの統合により、物理世界との接続性に … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.RO | コメントを受け付けていません

ConsistDreamer: 3D-Consistent 2D Diffusion for High-Fidelity Scene Editing

投稿日: 2024年6月14日作成者: jarxiv

要約この論文では、3D 認識と 3D 一貫性を備えた 2D 拡散モデルを強化す … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

投稿日: 2024年6月14日作成者: jarxiv

要約 4M や UnifiedIO などの現在のマルチモーダルおよびマルチタスク … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

「cs.AI」カテゴリーアーカイブ

Improving Autoregressive Training with Dynamic Oracles

Vertical LoRA: Dense Expectation-Maximization Interpretation of Transformers

RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar

PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance

Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition

Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms

Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

ConsistDreamer: 3D-Consistent 2D Diffusion for High-Fidelity Scene Editing

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

最近の投稿

最近のコメント

アーカイブ

カテゴリー