月別アーカイブ: 2024年7月

STARS: Self-supervised Tuning for 3D Action Recognition in Skeleton Sequences

投稿日: 2024年7月16日作成者: jarxiv

要約マスクされた予測を使用した自己教師あり事前トレーニング手法は、スケルトンベ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation

投稿日: 2024年7月16日作成者: jarxiv

要約人間中心のビデオ生成は大幅に進歩しましたが、ビデオ深度の共同生成の問題は依 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

投稿日: 2024年7月16日作成者: jarxiv

要約視覚言語モデルの最近の進歩により、視覚的命令の調整を通じて幅広いタスクが顕 … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV, cs.LG | コメントを受け付けていません

GRUtopia: Dream General Robots in a City at Scale

投稿日: 2024年7月16日作成者: jarxiv

要約最近の研究では、身体化された AI の分野におけるスケーリングの法則を調査 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

Can Textual Semantics Mitigate Sounding Object Segmentation Preference?

投稿日: 2024年7月16日作成者: jarxiv

要約オーディオビジュアルセグメンテーション (AVS) タスクは、オーディオキ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes

投稿日: 2024年7月16日作成者: jarxiv

要約従来の参照セグメンテーションタスクは主に静かな視覚シーンに焦点を当ててお … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models

投稿日: 2024年7月16日作成者: jarxiv

要約既製のテキストから画像への潜在拡散モデルを使用して、ビデオ内のオブジェクト … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Multi-Attention Integrated Deep Learning Frameworks for Enhanced Breast Cancer Segmentation and Identification

投稿日: 2024年7月16日作成者: jarxiv

要約乳がんは世界中で生命に深刻な脅威を与えており、毎年多くの命が奪われています … 続きを読む →

カテゴリー: cs.CV, cs.LG, eess.IV, F.2.2 | コメントを受け付けていません

No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations

投稿日: 2024年7月16日作成者: jarxiv

要約このペーパーでは、自己教師あり勾配を活用してビジョンエンコーダーの機能を … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation

投稿日: 2024年7月16日作成者: jarxiv

要約視覚モデルの領域では、主な表現モードはピクセルを使用して視覚世界をラスタラ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

月別アーカイブ: 2024年7月

STARS: Self-supervised Tuning for 3D Action Recognition in Skeleton Sequences

IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

GRUtopia: Dream General Robots in a City at Scale

Can Textual Semantics Mitigate Sounding Object Segmentation Preference?

Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes

InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models

Multi-Attention Integrated Deep Learning Frameworks for Enhanced Breast Cancer Segmentation and Identification

No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations

VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation

最近の投稿

最近のコメント

アーカイブ

カテゴリー