「cs.LG」カテゴリーアーカイブ

LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References

投稿日: 2024年12月2日作成者: jarxiv

要約通常、二時点画像の比較に依存する変更検出は、単一の画像しか利用できない場合 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos

投稿日: 2024年12月2日作成者: jarxiv

要約ビデオの理解は目覚ましい進歩を遂げているにもかかわらず、ほとんどの取り組み … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG, cs.MM | コメントを受け付けていません

PerLA: Perceptive 3D Language Assistant

投稿日: 2024年12月2日作成者: jarxiv

要約大規模言語モデル (LLM) で 3D 物理世界を理解できるようにすること … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks

投稿日: 2024年12月2日作成者: jarxiv

要約最近、人間の動作分析は、ノイズ除去拡散モデルや大規模言語モデルなどの刺激的 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.LG | コメントを受け付けていません

A Survey on Multimodal Large Language Models

投稿日: 2024年12月2日作成者: jarxiv

要約最近、GPT-4V に代表されるマルチモーダル大規模言語モデル (MLLM … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Feedback-driven object detection and iterative model improvement

投稿日: 2024年12月2日作成者: jarxiv

要約自動物体検出は、さまざまなアプリケーションにわたってますます価値が高まって … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

A Visual-inertial Localization Algorithm using Opportunistic Visual Beacons and Dead-Reckoning for GNSS-Denied Large-scale Applications

投稿日: 2024年12月2日作成者: jarxiv

要約スマートシティの発展に伴い、大規模な都市環境における継続的な歩行者ナビゲー … 続きを読む →

カテゴリー: cs.CV, cs.LG, eess.SP | コメントを受け付けていません

Towards Class-wise Robustness Analysis

投稿日: 2024年12月2日作成者: jarxiv

要約多くの下流タスクの解決には非常に成功していますが、ディープニューラルネ … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

SpaRC: Sparse Radar-Camera Fusion for 3D Object Detection

投稿日: 2024年12月2日作成者: jarxiv

要約この研究では、マルチビュー画像セマンティクスとレーダーおよびカメラポイン … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

FlowCLAS: Enhancing Normalizing Flow Via Contrastive Learning For Anomaly Segmentation

投稿日: 2024年12月2日作成者: jarxiv

要約異常セグメンテーションは、予期せぬイベントを認識する必要がある安全性が重要 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

「cs.LG」カテゴリーアーカイブ

LaVIDE: A Language-Vision Discriminator for Detecting Changes in Satellite Image with Map References

LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos

PerLA: Perceptive 3D Language Assistant

MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks

A Survey on Multimodal Large Language Models

Feedback-driven object detection and iterative model improvement

A Visual-inertial Localization Algorithm using Opportunistic Visual Beacons and Dead-Reckoning for GNSS-Denied Large-scale Applications

Towards Class-wise Robustness Analysis

SpaRC: Sparse Radar-Camera Fusion for 3D Object Detection

FlowCLAS: Enhancing Normalizing Flow Via Contrastive Learning For Anomaly Segmentation

最近の投稿

最近のコメント

アーカイブ

カテゴリー