月別アーカイブ: 2024年8月

Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model

投稿日: 2024年8月23日作成者: jarxiv

要約ビジュアル質問応答 (VQA) は、画像が与えられ、その画像について一連の … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

SiNGR: Brain Tumor Segmentation via Signed Normalized Geodesic Transform Regression

投稿日: 2024年8月23日作成者: jarxiv

要約脳腫瘍のセグメンテーションにおける主な課題の 1 つは、腫瘍境界に近いボク … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Comparing YOLOv5 Variants for Vehicle Detection: A Performance Analysis

投稿日: 2024年8月23日作成者: jarxiv

要約車両の検出は、交通および自動運転車両の管理において重要なタスクです。この … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformers

投稿日: 2024年8月23日作成者: jarxiv

要約より複雑な問題を解決するために、ディープニューラルネットワークは数十億 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Sapiens: Foundation for Human Vision Models

投稿日: 2024年8月23日作成者: jarxiv

要約我々は、人間中心の 4 つの基本的な視覚タスク (2D 姿勢推定、身体部分 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

MuMA-ToM: Multi-modal Multi-Agent Theory of Mind

投稿日: 2024年8月23日作成者: jarxiv

要約複雑な現実世界のシナリオにおける人々の社会的相互作用を理解するには、多くの … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Enhanced Parking Perception by Multi-Task Fisheye Cross-view Transformers

投稿日: 2024年8月23日作成者: jarxiv

要約現在の駐車エリア認識アルゴリズムは、主に限られた範囲内の空きスロットを検出 … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Real-Time Video Generation with Pyramid Attention Broadcast

投稿日: 2024年8月23日作成者: jarxiv

要約私たちは、DiT ベースのビデオ生成のためのリアルタイム、高品質、トレーニ … 続きを読む →

カテゴリー: cs.CV, cs.DC | コメントを受け付けていません

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

投稿日: 2024年8月23日作成者: jarxiv

要約テキストの説明からリアルなシーンを生成できるテキストからビデオ (T2V) … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Automating Deformable Gasket Assembly

投稿日: 2024年8月23日作成者: jarxiv

要約ガスケットの組み立てでは、変形可能なガスケットを狭いチャネルに位置合わせし … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

月別アーカイブ: 2024年8月

Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model

SiNGR: Brain Tumor Segmentation via Signed Normalized Geodesic Transform Regression

Comparing YOLOv5 Variants for Vehicle Detection: A Performance Analysis

Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformers

Sapiens: Foundation for Human Vision Models

MuMA-ToM: Multi-modal Multi-Agent Theory of Mind

Enhanced Parking Perception by Multi-Task Fisheye Cross-view Transformers

Real-Time Video Generation with Pyramid Attention Broadcast

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Automating Deformable Gasket Assembly

最近の投稿

最近のコメント

アーカイブ

カテゴリー