月別アーカイブ: 2025年5月

Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning

投稿日: 2025年5月7日作成者: jarxiv

要約ビジョン言語モデル（VLM）により、共有表現スペースにテキストと画像を埋め … 続きを読む →

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

Adversarial Robustness of Deep Learning Models for Inland Water Body Segmentation from SAR Images

投稿日: 2025年5月7日作成者: jarxiv

要約合成開口レーダー（SAR）画像からの内陸水域のセグメンテーションは、洪水マ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.LG, eess.IV | コメントを受け付けていません

DISARM++: Beyond scanner-free harmonization

投稿日: 2025年5月7日作成者: jarxiv

要約さまざまなスキャナーにわたるT1強調MR画像の調和は、神経画像研究の一貫性 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

投稿日: 2025年5月7日作成者: jarxiv

要約グラフィカルユーザーインターフェイス（GUI）をナビゲートしてドキュメント … 続きを読む →

カテゴリー: cs.AI, cs.CL, cs.CV | コメントを受け付けていません

Visual Imitation Enables Contextual Humanoid Control

投稿日: 2025年5月7日作成者: jarxiv

要約ヒューマノイドに階段を登り、周囲の環境のコンテキストを使用して椅子に座るよ … 続きを読む →

カテゴリー: cs.CV, cs.RO | コメントを受け付けていません

FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios

投稿日: 2025年5月7日作成者: jarxiv

要約アクションカスタマイズには、被験者が入力制御信号によって決定されるアクショ … 続きを読む →

カテゴリー: cs.AI, cs.CV, cs.MM | コメントを受け付けていません

Multi-Agent System for Comprehensive Soccer Understanding

投稿日: 2025年5月7日作成者: jarxiv

要約 AI主導のサッカー理解における最近の進歩は急速な進歩を示していますが、既存 … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

A Synergistic Framework of Nonlinear Acoustic Computing and Reinforcement Learning for Real-World Human-Robot Interaction

投稿日: 2025年5月7日作成者: jarxiv

要約このペーパーでは、非線形音響コンピューティングと強化学習を統合した新しいフ … 続きを読む →

カテゴリー: 68T01, cs.AI, cs.RO, I.2.8, physics.app-ph | コメントを受け付けていません

MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation

投稿日: 2025年5月7日作成者: jarxiv

要約拡散モデルは、テキストから画像の生成において優れたパフォーマンスを示してい … 続きを読む →

カテゴリー: cs.CV | コメントを受け付けていません

A Note on Statistically Accurate Tabular Data Generation Using Large Language Models

投稿日: 2025年5月7日作成者: jarxiv

要約大規模な言語モデル（LLM）は、合成表形式データ生成に有望を示していますが … 続きを読む →

カテゴリー: cs.AI, cs.LG | コメントを受け付けていません

月別アーカイブ: 2025年5月

Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning

Adversarial Robustness of Deep Learning Models for Inland Water Body Segmentation from SAR Images

DISARM++: Beyond scanner-free harmonization

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

Visual Imitation Enables Contextual Humanoid Control

FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios

Multi-Agent System for Comprehensive Soccer Understanding

A Synergistic Framework of Nonlinear Acoustic Computing and Reinforcement Learning for Real-World Human-Robot Interaction

MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation

A Note on Statistically Accurate Tabular Data Generation Using Large Language Models

最近の投稿

最近のコメント

アーカイブ

カテゴリー