月別アーカイブ: 2024年6月

SimTxtSeg: Weakly-Supervised Medical Image Segmentation with Simple Text Cues

投稿日: 2024年6月28日作成者: jarxiv

要約弱い教師付き医療画像のセグメンテーションは、セグメンテーションのパフォーマ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model

投稿日: 2024年6月28日作成者: jarxiv

要約トランスフォーマーベースのセグメンテーション手法は、高解像度の画像を扱う際 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Taming Data and Transformers for Audio Generation

投稿日: 2024年6月28日作成者: jarxiv

要約環境音や環境効果の生成は、データ不足とキャプションの品質が不十分なことが多 … 続きを読む →

カテゴリー: cs.CL, cs.CV, cs.MM, cs.SD, eess.AS | コメントを受け付けていません

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

投稿日: 2024年6月28日作成者: jarxiv

要約現在のユニバーサルセグメンテーション手法は、ピクセルレベルの画像とビデ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

SALVe: Semantic Alignment Verification for Floorplan Reconstruction from Sparse Panoramas

投稿日: 2024年6月28日作成者: jarxiv

要約私たちは、新しいペアワイズ学習アライメント検証器である SALVe によっ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads

投稿日: 2024年6月28日作成者: jarxiv

要約視覚認識タスクは、主に Vision Transformer (ViT) … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos

投稿日: 2024年6月28日作成者: jarxiv

要約ビデオイベント内で時間的推論を実行する AI モデルの能力を厳密にテスト … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Looking 3D: Anomaly Detection with 2D-3D Alignment

投稿日: 2024年6月28日作成者: jarxiv

要約視覚的な手がかりに基づく自動異常検出は、製造や製品品質評価などのさまざまな … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

HUWSOD: Holistic Self-training for Unified Weakly Supervised Object Detection

投稿日: 2024年6月28日作成者: jarxiv

要約ほとんどの WSOD 手法は、候補領域を生成するために従来のオブジェクト提 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Dataset Size Recovery from LoRA Weights

投稿日: 2024年6月28日作成者: jarxiv

要約モデル反転攻撃とメンバーシップ推論攻撃は、モデルがトレーニングされたデータ … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2024年6月

SimTxtSeg: Weakly-Supervised Medical Image Segmentation with Simple Text Cues

Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model

Taming Data and Transformers for Audio Generation

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

SALVe: Semantic Alignment Verification for Floorplan Reconstruction from Sparse Panoramas

Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads

ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos

Looking 3D: Anomaly Detection with 2D-3D Alignment

HUWSOD: Holistic Self-training for Unified Weakly Supervised Object Detection

Dataset Size Recovery from LoRA Weights

最近の投稿

最近のコメント

アーカイブ

カテゴリー