月別アーカイブ: 2024年6月

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

投稿日: 2024年6月14日作成者: jarxiv

要約 4M や UnifiedIO などの現在のマルチモーダルおよびマルチタスク … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG | コメントを受け付けていません

Data Attribution for Text-to-Image Models by Unlearning Synthesized Images

投稿日: 2024年6月14日作成者: jarxiv

要約テキストから画像へのモデルのデータ帰属の目的は、新しい画像の生成に最も影響 … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Towards Evaluating the Robustness of Visual State Space Models

投稿日: 2024年6月14日作成者: jarxiv

要約ヴィジョンステートスペースモデル (VSSM) は、リカレントニュ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras

投稿日: 2024年6月14日作成者: jarxiv

要約点像分布関数 (PSF) エンジニアリングは、位相マスクやその他の光学素子 … 続きを読む →

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach

投稿日: 2024年6月14日作成者: jarxiv

要約衛星画像 (SAI) におけるシーングラフ生成 (SGG) は、知覚から … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

投稿日: 2024年6月14日作成者: jarxiv

要約マルチモーダル LLM の堅牢なマルチ画像理解機能に焦点を当てた包括的なベ … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Explore the Limits of Omni-modal Pretraining at Scale

投稿日: 2024年6月14日作成者: jarxiv

要約私たちは、あらゆるモダリティを理解し、普遍的な表現を学習できるオムニモーダ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, cs.MM | コメントを受け付けていません

Depth Anything V2

投稿日: 2024年6月14日作成者: jarxiv

要約本作はDepth Anything V2を紹介します。私たちは、派手なテ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Interpreting the Weight Space of Customized Diffusion Models

投稿日: 2024年6月14日作成者: jarxiv

要約カスタマイズされた拡散モデルの大規模なコレクションにまたがる重みの空間を調 … 続きを読む →

カテゴリー: cs.CV, cs.GR, cs.LG | コメントを受け付けていません

Rethinking Score Distillation as a Bridge Between Image Distributions

投稿日: 2024年6月14日作成者: jarxiv

要約スコア蒸留サンプリング (SDS) は重要なツールであることが証明されてお … 続きを読む →

カテゴリー: cs.CV, cs.GR, cs.LG | コメントを受け付けていません

月別アーカイブ: 2024年6月

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

Data Attribution for Text-to-Image Models by Unlearning Synthesized Images

Towards Evaluating the Robustness of Visual State Space Models

CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras

Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

Explore the Limits of Omni-modal Pretraining at Scale

Depth Anything V2

Interpreting the Weight Space of Customized Diffusion Models

Rethinking Score Distillation as a Bridge Between Image Distributions

最近の投稿

最近のコメント

アーカイブ

カテゴリー