月別アーカイブ: 2024年9月

Open-vocabulary Temporal Action Localization using VLMs

投稿日: 2024年9月2日作成者: jarxiv

要約ビデオアクションのローカリゼーションは、長いビデオから特定のアクションの … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.RO | コメントを受け付けていません

CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion

投稿日: 2024年9月2日作成者: jarxiv

要約ビデオ生成 AI モデル (SORA など) の進歩に伴い、クリエイターは … 続きを読む →

カテゴリー: cs.CV, cs.HC | コメントを受け付けていません

DARES: Depth Anything in Robotic Endoscopic Surgery with Self-supervised Vector-LoRA of the Foundation Model

投稿日: 2024年9月2日作成者: jarxiv

要約ロボット支援手術 (RAS) は、3D 再構築と視覚化のための正確な深度推 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane

投稿日: 2024年9月2日作成者: jarxiv

要約私たちは、シングルパスでセマンティック構成の 3D シーンを生成できる拡 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.GR | コメントを受け付けていません

Bridging Episodes and Semantics: A Novel Framework for Long-Form Video Understanding

投稿日: 2024年9月2日作成者: jarxiv

要約既存の研究では、長い形式のビデオを拡張された短いビデオとして扱うことがよく … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Addressing the challenges of loop detection in agricultural environments

投稿日: 2024年9月2日作成者: jarxiv

要約視覚的な SLAM システムは十分に研究されており、屋内および都市環境では … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

OpticalRS-4M: Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset

投稿日: 2024年9月2日作成者: jarxiv

要約マスクイメージモデリング (MIM) は、リモートセンシング (RS … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

投稿日: 2024年9月2日作成者: jarxiv

要約言語モデルの最近の進歩は大幅な進歩を遂げています。 GPT-4o は新たな … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.HC, cs.LG, cs.SD, eess.AS | コメントを受け付けていません

Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks

投稿日: 2024年9月2日作成者: jarxiv

要約テスト時間の分布シフトの検出は、機械学習モデルを安全に導入するための重要な … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

月別アーカイブ: 2024年9月

Open-vocabulary Temporal Action Localization using VLMs

CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion

DARES: Depth Anything in Robotic Endoscopic Surgery with Self-supervised Vector-LoRA of the Foundation Model

Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane

Bridging Episodes and Semantics: A Novel Framework for Long-Form Video Understanding

Addressing the challenges of loop detection in agricultural environments

OpticalRS-4M: Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks

最近の投稿

最近のコメント

アーカイブ

カテゴリー