月別アーカイブ: 2024年6月

Deciphering ‘What’ and ‘Where’ Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations

投稿日: 2024年6月21日作成者: jarxiv

要約我々は、ニューラルネットワークの活性化内に含まれるグループ化情報を分析し、 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Video Generation with Learned Action Prior

投稿日: 2024年6月21日作成者: jarxiv

要約確率的ビデオ生成は、カメラが移動プラットフォームに取り付けられている場合に … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

MM-GTUNets: Unified Multi-Modal Graph Deep Learning for Brain Disorders Prediction

投稿日: 2024年6月21日作成者: jarxiv

要約グラフディープラーニング (GDL) は、画像データと非画像データの両 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Capturing Temporal Components for Time Series Classification

投稿日: 2024年6月21日作成者: jarxiv

要約特にモノのインターネットパラダイムから収集されるデータが豊富であるため、 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

RankCLIP: Ranking-Consistent Language-Image Pretraining

投稿日: 2024年6月21日作成者: jarxiv

要約 CLIP などの自己教師あり対比学習モデルは、多くの下流タスクにおける視覚 … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Self-supervised Multi-actor Social Activity Understanding in Streaming Videos

投稿日: 2024年6月21日作成者: jarxiv

要約この研究では、監視や支援ロボット工学などの現実世界のタスクにおける重要なコ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset

投稿日: 2024年6月21日作成者: jarxiv

要約ラージビジョンモデル (LVM) からの有害な出力のリスクを軽減するた … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.DB | コメントを受け付けていません

On Layer-wise Representation Similarity: Application for Multi-Exit Models with a Single Classifier

投稿日: 2024年6月21日作成者: jarxiv

要約異なるモデル内および異なるモデル間の内部表現の類似性を分析することは、ディ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines

投稿日: 2024年6月21日作成者: jarxiv

要約小型物体検出 (SOD) は、数十年にわたって長年にわたって課題となってき … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models?

投稿日: 2024年6月21日作成者: jarxiv

要約ラージビジョンランゲージモデル (LVLM) は、最近、画像キャプシ … 続きを読む →

カテゴリー: cs.CL, cs.CV | コメントを受け付けていません

月別アーカイブ: 2024年6月

Deciphering ‘What’ and ‘Where’ Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations

Video Generation with Learned Action Prior

MM-GTUNets: Unified Multi-Modal Graph Deep Learning for Brain Disorders Prediction

Capturing Temporal Components for Time Series Classification

RankCLIP: Ranking-Consistent Language-Image Pretraining

Self-supervised Multi-actor Social Activity Understanding in Streaming Videos

SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset

On Layer-wise Representation Similarity: Application for Multi-Exit Models with a Single Classifier

Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines

Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models?

最近の投稿

最近のコメント

アーカイブ

カテゴリー