月別アーカイブ: 2024年6月

Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video

投稿日: 2024年6月19日作成者: jarxiv

要約単一の 3D ビデオから流体の隠れた特性を推測し、新しいシーンで観察された … 続きを読む →

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation

投稿日: 2024年6月19日作成者: jarxiv

要約テキストから画像への生成では高品質の結果が得られますが、生成されたコンテン … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

Graph Neural Networks in Histopathology: Emerging Trends and Future Directions

投稿日: 2024年6月19日作成者: jarxiv

要約全スライド画像 (WSI) の組織病理学的分析では、深層学習手法、特に畳み … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, I.2.10, q-bio.TO | コメントを受け付けていません

Adversarial Attacks on Multimodal Agents

投稿日: 2024年6月19日作成者: jarxiv

要約現在、ビジョン対応言語モデル (VLM) は、実際の環境でアクションを実行 … 続きを読む →

カテゴリー: cs.CL, cs.CR, cs.CV, cs.LG | コメントを受け付けていません

Neural Approximate Mirror Maps for Constrained Diffusion Models

投稿日: 2024年6月19日作成者: jarxiv

要約拡散モデルは、視覚的に説得力のある画像を作成することに優れていますが、トレ … 続きを読む →

カテゴリー: cs.CV, cs.LG, eess.IV | コメントを受け付けていません

Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection

投稿日: 2024年6月19日作成者: jarxiv

要約 Transformers に入力する前に 3D ボクセルをシリアル化して複 … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing

投稿日: 2024年6月19日作成者: jarxiv

要約ビデオ編集は、エンターテインメントや教育からプロフェッショナルなコミュニケ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM | コメントを受け付けていません

GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation

投稿日: 2024年6月19日作成者: jarxiv

要約参照ビデオオブジェクトセグメンテーション (RVOS) は、ビデオ全体 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging

投稿日: 2024年6月19日作成者: jarxiv

要約最近の研究では、畳み込みニューラルネットワークの層の数を減らすと、ネット … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

DrVideo: Document Retrieval Based Long Video Understanding

投稿日: 2024年6月19日作成者: jarxiv

要約長いビデオを理解するための既存の方法は、主に数十秒しか続かないビデオに焦点 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

月別アーカイブ: 2024年6月

Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video

AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation

Graph Neural Networks in Histopathology: Emerging Trends and Future Directions

Adversarial Attacks on Multimodal Agents

Neural Approximate Mirror Maps for Constrained Diffusion Models

Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection

VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing

GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation

LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging

DrVideo: Document Retrieval Based Long Video Understanding

最近の投稿

最近のコメント

アーカイブ

カテゴリー