jarxiv | Japanese arxiv | ページ 1187

Federated Causal Inference: Multi-Study ATE Estimation beyond Meta-Analysis

投稿日: 2025年3月26日作成者: jarxiv

要約

私たちは、センターを越えて分散型データから治療効果を推定するアプローチであるフェデレーション因果推論を研究しています。
単純なメタ分析からワンショットおよびマルチショットのフェデレート学習に至るまで、プラグインGフォーマーから派生した3つのクラスの平均治療効果（ATE）推定器を比較します。
ランダム化比較試験（RCT）に焦点を当て、線形モデルのこれらの推定器の漸近分散を導き出します。
私たちの結果は、サンプルサイズ、共変量分布、治療割り当てスキーム、中心効果など、さまざまなシナリオの適切な推定器を選択するための実用的なガイダンスを提供します。
シミュレーション調査でこれらの調査結果を検証します。

要約(オリジナル)

We study Federated Causal Inference, an approach to estimate treatment effects from decentralized data across centers. We compare three classes of Average Treatment Effect (ATE) estimators derived from the Plug-in G-Formula, ranging from simple meta-analysis to one-shot and multi-shot federated learning, the latter leveraging the full data to learn the outcome model (albeit requiring more communication). Focusing on Randomized Controlled Trials (RCTs), we derive the asymptotic variance of these estimators for linear models. Our results provide practical guidance on selecting the appropriate estimator for various scenarios, including heterogeneity in sample sizes, covariate distributions, treatment assignment schemes, and center effects. We validate these findings with a simulation study.

arxiv情報

著者	Rémi Khellaf,Aurélien Bellet,Julie Josse
発行日	2025-03-25 14:18:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, math.ST, stat.ML, stat.TH | コメントを受け付けていません

Data-efficient rapid prediction of urban airflow and temperature fields for complex building geometries

投稿日: 2025年3月26日作成者: jarxiv

要約

建物の幾何学のみに基づいて、風速や温度を含む都市の微気候を正確に予測するには、建物と気流の間の複雑な相互作用、特に方向性幾何学の影響を受けた長距離覚醒効果をキャプチャする必要があります。
計算流体のダイナミクス（CFD）に依存する従来の方法は、大規模なシミュレーションでは法外に高価ですが、データ駆動型のアプローチは、限られたトレーニングデータとローカルおよび遠方の依存関係の両方をモデル化する必要性と格闘しています。
これに応じて、最小限のCFDデータで効果的な風のフィールド予測を実現するために、ローカライズされたトレーニングと組み合わせた多方向距離機能（MDDF）を活用する新しいフレームワークを提案します。
問題の次元を減らすことにより、ローカライズされたトレーニングはトレーニングサンプルの数を効果的に増やし、MDDFは周囲の幾何学的情報をエンコードして、ウェイクダイナミクスとフローリダイレクトを正確にモデル化します。
わずか24のCFDシミュレーションでトレーニングされた当社のローカライズされたフーリエ神経演算子（ローカルFNO）モデルは、1分未満で完全な3D風速と温度予測を生成し、従来のCFDメソッドよりも500倍のスピードアップをもたらします。
風速が0.3 m/s、目に見えない都市構成の温度で0.3 $^{\ circ} $ cの平均絶対誤差により、この方法は強力な一般化能力と実用的な都市アプリケーションの重要な可能性を示しています。

要約(オリジナル)

Accurately predicting urban microclimate, including wind speed and temperature, based solely on building geometry requires capturing complex interactions between buildings and airflow, particularly long-range wake effects influenced by directional geometry. Traditional methods relying on computational fluid dynamics (CFD) are prohibitively expensive for large-scale simulations, while data-driven approaches struggle with limited training data and the need to model both local and far-field dependencies. In response, we propose a novel framework that leverages a multi-directional distance feature (MDDF) combined with localized training to achieve effective wind field predictions with minimal CFD data. By reducing the problem’s dimensionality, localized training effectively increases the number of training samples, while MDDF encodes the surrounding geometric information to accurately model wake dynamics and flow redirection. Trained on only 24 CFD simulations, our localized Fourier neural operator (Local-FNO) model generates full 3D wind velocity and temperature predictions in under one minute, yielding a 500-fold speedup over conventional CFD methods. With mean absolute errors of 0.3 m/s for wind speed and 0.3 $^{\circ}$C for temperature on unseen urban configurations, our method demonstrates strong generalization capabilities and significant potential for practical urban applications.

arxiv情報

著者	Shaoxiang Qin,Dongxue Zhan,Ahmed Marey,Dingyang Geng,Theodore Potsis,Liangzhu Leon Wang
発行日	2025-03-25 14:36:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, physics.flu-dyn | コメントを受け付けていません

Pfungst and Clever Hans: Identifying the unintended cues in a widely used Alzheimer’s disease MRI dataset using explainable deep learning

投稿日: 2025年3月26日作成者: jarxiv

要約

背景。
深いニューラルネットワークは、アルツハイマー病（AD）を分類する際に高い精度を実証しています。
この研究の目的は、基礎となるブラックボックスの性質を啓発し、T1強調（T1W）グレーホワイトの物質テクスチャ、ボリューム情報情報、分類パフォーマンスに関する前処理の個々の貢献を明らかにすることを目的としています。
方法。
アルツハイマー病の神経画像イニシアチブからのT1W MRIデータを利用して、一致したAD患者（990 MRI）と健康なコントロール（990 MRI）を区別しました。
前処理には、テクスチャ情報を体系的に排除するために、さまざまなしきい値での頭蓋骨の剥離と二等分が含まれていました。
これらの構成について深いニューラルネットワークをトレーニングし、モデルのパフォーマンスを、個別のボンフェローニホルム補正を使用したMcNemarテストを使用して比較されました。
レイヤーごとの関連性伝播（LRP）とヒートマップ間の構造的類似性メトリックを適用して、学習された機能を分析しました。
結果。
分類パフォーマンスメトリック（精度、感度、および特異性）は、すべての構成で同等であり、T1Wグレーと白の信号テクスチャの無視できる影響を示しています。
二等層の画像で訓練されたモデルは、萎縮や頭蓋骨の縞模様の特徴などの体積機能を備えた同様の特徴のパフォーマンスと関連性分布を実証しました。
結論。
広く使用されている広告MRIデータセットで、以前に発見されていない巧妙なHans効果を明らかにしました。
ディープニューラルネットワークの分類は、主に体積機能に依存していますが、グレーホワイトの物質T1Wテクスチャを排除してもパフォーマンスは低下しませんでした。
この研究は、少なくとも広く使用されている構造T1W画像では、グレーホワイトの物質コントラストの重要性の過大評価を明確に示しており、パフォーマンスメトリックの潜在的な誤解を強調しています。

要約(オリジナル)

Backgrounds. Deep neural networks have demonstrated high accuracy in classifying Alzheimer’s disease (AD). This study aims to enlighten the underlying black-box nature and reveal individual contributions of T1-weighted (T1w) gray-white matter texture, volumetric information and preprocessing on classification performance. Methods. We utilized T1w MRI data from the Alzheimer’s Disease Neuroimaging Initiative to distinguish matched AD patients (990 MRIs) from healthy controls (990 MRIs). Preprocessing included skull stripping and binarization at varying thresholds to systematically eliminate texture information. A deep neural network was trained on these configurations, and the model performance was compared using McNemar tests with discrete Bonferroni-Holm correction. Layer-wise Relevance Propagation (LRP) and structural similarity metrics between heatmaps were applied to analyze learned features. Results. Classification performance metrics (accuracy, sensitivity, and specificity) were comparable across all configurations, indicating a negligible influence of T1w gray- and white signal texture. Models trained on binarized images demonstrated similar feature performance and relevance distributions, with volumetric features such as atrophy and skull-stripping features emerging as primary contributors. Conclusions. We revealed a previously undiscovered Clever Hans effect in a widely used AD MRI dataset. Deep neural networks classification predominantly rely on volumetric features, while eliminating gray-white matter T1w texture did not decrease the performance. This study clearly demonstrates an overestimation of the importance of gray-white matter contrasts, at least for widely used structural T1w images, and highlights potential misinterpretation of performance metrics.

arxiv情報

著者	Christian Tinauer,Maximilian Sackl,Rudolf Stollberger,Stefan Ropele,Christian Langkammer
発行日	2025-03-25 14:41:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.LG, eess.IV | コメントを受け付けていません

DeltaZip: Efficient Serving of Multiple Full-Model-Tuned LLMs

投稿日: 2025年3月26日作成者: jarxiv

要約

大型言語モデル（LLMS）を微調整すると、ダウンストリームタスクのモデル品質が大幅に向上します。
ただし、多くの微調整されたLLMSを同時に提供することは、異なるLLMの散発的で破裂した、さまざまな要求パターンのために挑戦的です。
このギャップを埋めるために、高モデルの品質を維持しながらモデルデルタを最大10倍に積極的に圧縮することにより、複数のフルパラメーターの微調整されたモデルを同時に効率的に提供するLLMサービングシステムであるDeltazipを紹介します。
この設計の背後にある重要な洞察は、微調整が事前に訓練されたモデルにわずかな変化をもたらすことです。
サービングシステムを圧縮アルゴリズムと共同設計することにより、Deltazipは最先端のシステムと比較して2倍から12倍のスループットの改善を達成します。

要約(オリジナル)

Fine-tuning large language models (LLMs) greatly improves model quality for downstream tasks. However, serving many fine-tuned LLMs concurrently is challenging due to the sporadic, bursty, and varying request patterns of different LLMs. To bridge this gap, we present DeltaZip, an LLM serving system that efficiently serves multiple full-parameter fine-tuned models concurrently by aggressively compressing model deltas by up to 10x while maintaining high model quality. The key insight behind this design is that fine-tuning results in small-magnitude changes to the pre-trained model. By co-designing the serving system with the compression algorithm, DeltaZip achieves 2x to 12x improvement in throughput compared to the state-of-the-art systems.

arxiv情報

著者	Xiaozhe Yao,Qinghao Hu,Ana Klimovic
発行日	2025-03-25 14:48:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.DC, cs.LG | コメントを受け付けていません

How to RETIRE Tabular Data in Favor of Discrete Digital Signal Representation

投稿日: 2025年3月26日作成者: jarxiv

要約

コンピュータービジョンタスクにおける深いニューラルネットワークによって達成された成功は、近年、多次元エンコーディング（MDE）と呼ばれる新しい研究分野の出現につながっています。
このファミリに属する方法は、表形式データを均質な形式の個別のデジタル信号（画像）に変換して、畳み込みネットワークを最初に不適切な問題に適用することを目的としています。
連続した新興作品にもかかわらず、多次元エンコーディング方法のプールはまだ低く、既存のモダリティエンコーディング技術に関する研究の範囲は非常に限られています。
この研究分野に貢献するために、表形式から画像表現（退職）へのレーダーベースのエンコーディングを提案します。これにより、表のデータをレーダーグラフとして表現し、各問題インスタンスの特徴特性をキャプチャします。
退職は、分類の精度と計算の複雑さの観点から、最先端のMDEアルゴリズムのプールとXGBoostと比較されました。
さらに、退職性と既存のMDE技術の両方についてより多くの洞察を提供するために、転送可能性と説明可能性に関する分析が実施されました。
統計分析によってサポートされた得られた結果は、他の確立されたMDEメソッドよりも退職の優位性を確認します。

要約(オリジナル)

The successes achieved by deep neural networks in computer vision tasks have led in recent years to the emergence of a new research area dubbed Multi-Dimensional Encoding (MDE). Methods belonging to this family aim to transform tabular data into a homogeneous form of discrete digital signals (images) to apply convolutional networks to initially unsuitable problems. Despite the successive emerging works, the pool of multi-dimensional encoding methods is still low, and the scope of research on existing modality encoding techniques is quite limited. To contribute to this area of research, we propose the Radar-based Encoding from Tabular to Image REpresentation (RETIRE), which allows tabular data to be represented as radar graphs, capturing the feature characteristics of each problem instance. RETIRE was compared with a pool of state-of-the-art MDE algorithms as well as with XGBoost in terms of classification accuracy and computational complexity. In addition, an analysis was carried out regarding transferability and explainability to provide more insight into both RETIRE and existing MDE techniques. The results obtained, supported by statistical analysis, confirm the superiority of RETIRE over other established MDE methods.

arxiv情報

著者	Paweł Zyblewski,Szymon Wojciechowski
発行日	2025-03-25 15:00:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG | コメントを受け付けていません

Internet of Things-Based Smart Precision Farming in Soilless Agriculture:Opportunities and Challenges for Global Food Security

投稿日: 2025年3月26日作成者: jarxiv

要約

世界人口の急速な成長と耕作可能な土地の継続的な減少は、食料安全保障に大きな脅威をもたらします。
気候変動が農地の利用可能性をさらに低下させるため、この課題は悪化します。
水耕栽培、エアロポニクス、アクアポニクスなどの農業は、制御された環境で効率的な作物栽培を可能にすることにより、持続可能なソリューションを提供します。
モノのインターネット（IoT）とスマート精度の農業との統合により、資源効率が改善され、環境制御が自動化され、安定した高収穫量の生産が保証されます。
IoT対応のスマート農業システムは、リアルタイムの監視、データ駆動型の意思決定、自動化を利用して、人間の介入を最小限に抑えながら水と栄養の使用を最適化します。
このペーパーでは、IoTベースのSoilless農業の機会と課題を探り、持続可能な農業、都市農業、世界の食料安全保障におけるその役割を強調しています。
これらの高度な農業方法により、生産性、資源の保全、一年中の栽培が向上します。
しかし、彼らはまた、高い初期投資、技術依存、エネルギー消費などの課題に直面しています。
包括的な研究、書誌分析、および比較分析を通じて、この研究は現在の傾向と研究のギャップを強調しています。
また、研究者、政策立案者、業界の利害関係者が、IoT主導のSoilless農業の革新とスケーラビリティを促進するための将来の方向性を概説しています。
垂直農業と制御された環境農業（CEA）対応のソアレステクニックの利点を強調することにより、このペーパーでは、食料安全保障上の課題に対処し、持続可能な農業革新を促進するための情報に基づいた意思決定をサポートします。

要約(オリジナル)

The rapid growth of the global population and the continuous decline in cultivable land pose significant threats to food security. This challenge worsens as climate change further reduces the availability of farmland. Soilless agriculture, such as hydroponics, aeroponics, and aquaponics, offers a sustainable solution by enabling efficient crop cultivation in controlled environments. The integration of the Internet of Things (IoT) with smart precision farming improves resource efficiency, automates environmental control, and ensures stable and high-yield crop production. IoT-enabled smart farming systems utilize real-time monitoring, data-driven decision-making, and automation to optimize water and nutrient usage while minimizing human intervention. This paper explores the opportunities and challenges of IoT-based soilless farming, highlighting its role in sustainable agriculture, urban farming, and global food security. These advanced farming methods ensure greater productivity, resource conservation, and year-round cultivation. However, they also face challenges such as high initial investment, technological dependency, and energy consumption. Through a comprehensive study, bibliometric analysis, and comparative analysis, this research highlights current trends and research gaps. It also outlines future directions for researchers, policymakers, and industry stakeholders to drive innovation and scalability in IoT-driven soilless agriculture. By emphasizing the benefits of vertical farming and Controlled Environment Agriculture (CEA)-enabled soilless techniques, this paper supports informed decision-making to address food security challenges and promote sustainable agricultural innovations.

arxiv情報

著者	Monica Dutta,Deepali Gupta,Sumegh Tharewal,Deepam Goyal,Jasminder Kaur Sandhu,Manjit Kaur,Ahmad Ali Alzubi,Jazem Mutared Alanazi
発行日	2025-03-25 15:18:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, eess.SP | コメントを受け付けていません

Interpretable Deep Regression Models with Interval-Censored Failure Time Data

投稿日: 2025年3月26日作成者: jarxiv

要約

ディープニューラルネットワーク（DNNS）は、各隠されたレイヤーの単純な関数を順次統合することにより、複雑なデータ構造をモデル化するための強力なツールになりました。
生存分析では、DNNの最近の進歩は、主にモデル能力の向上に焦点を当てています。
ただし、観測不能な故障時間が間隔にあることのみが知られている間隔で検査されたデータの深い学習方法は、露出していないままであり、特定のデータタイプまたはモデルに限定されます。
この作業では、部分的に線形変換モデルの広範なクラスを備えたインターバルセンサーデータの一般的な回帰フレームワークを提案します。ここでは、主要な共変量効果がパラメトリックにモデル化され、迷惑な多モーダル共変量の非線形効果はDNNSを介して近似され、解釈可能性と柔軟性のバランスが取れています。
モノトンスプラインを活用して累積ベースラインハザード関数を近似することにより、ふるいの最尤推定を使用します。
信頼できる扱いやすい推定を確保するために、確率的勾配降下を組み込んだEMアルゴリズムを開発します。
パラメーター推定器の漸近特性を確立し、DNN推定器が最小光最適収束を達成することを示します。
広範なシミュレーションは、最先端の方法よりも優れた推定と予測の精度を示しています。
私たちの方法をアルツハイマー病の疾患ニューロイメージングイニシアチブデータセットに適用すると、従来のアプローチと比較して、新しい洞察と予測パフォーマンスが向上します。

要約(オリジナル)

Deep neural networks (DNNs) have become powerful tools for modeling complex data structures through sequentially integrating simple functions in each hidden layer. In survival analysis, recent advances of DNNs primarily focus on enhancing model capabilities, especially in exploring nonlinear covariate effects under right censoring. However, deep learning methods for interval-censored data, where the unobservable failure time is only known to lie in an interval, remain underexplored and limited to specific data type or model. This work proposes a general regression framework for interval-censored data with a broad class of partially linear transformation models, where key covariate effects are modeled parametrically while nonlinear effects of nuisance multi-modal covariates are approximated via DNNs, balancing interpretability and flexibility. We employ sieve maximum likelihood estimation by leveraging monotone splines to approximate the cumulative baseline hazard function. To ensure reliable and tractable estimation, we develop an EM algorithm incorporating stochastic gradient descent. We establish the asymptotic properties of parameter estimators and show that the DNN estimator achieves minimax-optimal convergence. Extensive simulations demonstrate superior estimation and prediction accuracy over state-of-the-art methods. Applying our method to the Alzheimer’s Disease Neuroimaging Initiative dataset yields novel insights and improved predictive performance compared to traditional approaches.

arxiv情報

著者	Changhui Yuan,Shishun Zhao,Shuwei Li,Xinyuan Song,Zhao Chen
発行日	2025-03-25 15:27:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, math.ST, stat.ML, stat.TH | コメントを受け付けていません

BiPrompt-SAM: Enhancing Image Segmentation via Explicit Selection between Point and Text Prompts

投稿日: 2025年3月26日作成者: jarxiv

要約

セグメンテーションはコンピュータービジョンの基本的なタスクであり、柔軟性のために迅速な駆動型の方法が顕著になります。
最近のAnyny Anything Model（SAM）は、強力なポイントプロンプトセグメンテーション機能を実証していますが、テキストベースのセグメンテーションモデルは豊富なセマンティック理解を提供します。
ただし、既存のアプローチでは、最適なセグメンテーションパフォーマンスのためにこれらの補完的なモダリティを効果的に組み合わせる方法を探求することはめったにありません。
このペーパーでは、明示的な選択メカニズムを介してポイントとテキストプロンプトの利点を融合する新しいデュアルモーダルプロンプトセグメンテーションフレームワークであるBiprompt-Samを紹介します。
具体的には、複数のマスク候補を生成するSAMの固有の機能を活用し、テキストプロンプトからセマンティックガイダンスマスクと組み合わせて、類似性メトリックに基づいて最も適切な候補を明示的に選択します。
このアプローチは、ポイントとテキストモジュールが明確な「専門家」として機能する専門家（MOE）システムの単純化された混合物として見ることができ、類似性のスコアリングは基本的な「ゲーティングネットワーク」として機能します。
Endovis17 Medical DatasetとRefCocoシリーズの自然画像データセットの両方で広範な評価を実施しました。
Endovis17では、Biprompt-SAMは89.55 \％mdiceと81.46 \％miouを達成しました。これは、最先端の専門的な医療セグメンテーションモデルに匹敵します。
RefCocoシリーズのデータセットでは、この方法で87.1 \％、86.5 \％、および85.8 \％IOUが達成され、既存のアプローチを大幅に上回りました。
実験は、明示的なデュアル選択法が、ポイントプロンプトの空間精度と、特に意味的に複雑なオブジェクト、複数の同様のオブジェクト、および部分閉塞を含むシナリオで優れているテキストプロンプトのセマンティックリッチネスと効果的に組み合わせることを示しています。
Biprompt-SAMは、シンプルでありながら効果的な実装を提供するだけでなく、マルチモーダルプロンプト融合に関する新しい視点も提供します。

要約(オリジナル)

Segmentation is a fundamental task in computer vision, with prompt-driven methods gaining prominence due to their flexibility. The recent Segment Anything Model (SAM) has demonstrated powerful point-prompt segmentation capabilities, while text-based segmentation models offer rich semantic understanding. However, existing approaches rarely explore how to effectively combine these complementary modalities for optimal segmentation performance. This paper presents BiPrompt-SAM, a novel dual-modal prompt segmentation framework that fuses the advantages of point and text prompts through an explicit selection mechanism. Specifically, we leverage SAM’s inherent ability to generate multiple mask candidates, combined with a semantic guidance mask from text prompts, and explicitly select the most suitable candidate based on similarity metrics. This approach can be viewed as a simplified Mixture of Experts (MoE) system, where the point and text modules act as distinct ‘experts,’ and the similarity scoring serves as a rudimentary ‘gating network.’ We conducted extensive evaluations on both the Endovis17 medical dataset and RefCOCO series natural image datasets. On Endovis17, BiPrompt-SAM achieved 89.55\% mDice and 81.46\% mIoU, comparable to state-of-the-art specialized medical segmentation models. On the RefCOCO series datasets, our method attained 87.1\%, 86.5\%, and 85.8\% IoU, significantly outperforming existing approaches. Experiments demonstrate that our explicit dual-selection method effectively combines the spatial precision of point prompts with the semantic richness of text prompts, particularly excelling in scenarios involving semantically complex objects, multiple similar objects, and partial occlusions. BiPrompt-SAM not only provides a simple yet effective implementation but also offers a new perspective on multi-modal prompt fusion.

arxiv情報

著者	Suzhe Xu,Jialin Peng,Chengyuan Zhang
発行日	2025-03-25 15:38:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation

投稿日: 2025年3月26日作成者: jarxiv

要約

ビジョンと言語モデル（VLM）を使用して、オープンボキャブラリーセマンティックセグメンテーションのためのトレーニングなしの方法を提案します。
私たちのアプローチは、パッチとパッチとパッチの関係を組み込むことで予測を共同で最適化するラベル伝播を通じて、VLMの初期ごとの予測を強化します。
VLMは主にモーダル内の類似性ではなく、クロスモーダルアラインメントのために最適化されているため、これらの関係をよりよくキャプチャするように観察されるビジョンモデル（VM）を使用します。
パッチベースのエンコーダーに固有の解像度の制限に対処し、ピクセルレベルでラベル伝播を改良ステップとして適用し、クラスの境界近くのセグメンテーションの精度を大幅に改善します。
LPOSS+と呼ばれる私たちの方法は、画像全体に推論を実行し、ウィンドウベースの処理を回避し、それによって画像全体にわたってコンテキスト相互作用をキャプチャします。
LPOSS+は、多様なデータセットのセットで、トレーニングなしの方法で最先端のパフォーマンスを実現します。
コード：https：//github.com/vladan-stojnic/lposs

要約(オリジナル)

We propose a training-free method for open-vocabulary semantic segmentation using Vision-and-Language Models (VLMs). Our approach enhances the initial per-patch predictions of VLMs through label propagation, which jointly optimizes predictions by incorporating patch-to-patch relationships. Since VLMs are primarily optimized for cross-modal alignment and not for intra-modal similarity, we use a Vision Model (VM) that is observed to better capture these relationships. We address resolution limitations inherent to patch-based encoders by applying label propagation at the pixel level as a refinement step, significantly improving segmentation accuracy near class boundaries. Our method, called LPOSS+, performs inference over the entire image, avoiding window-based processing and thereby capturing contextual interactions across the full image. LPOSS+ achieves state-of-the-art performance among training-free methods, across a diverse set of datasets. Code: https://github.com/vladan-stojnic/LPOSS

arxiv情報

著者	Vladan Stojnić,Yannis Kalantidis,Jiří Matas,Giorgos Tolias
発行日	2025-03-25 15:47:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

PyGraph: Robust Compiler Support for CUDA Graphs in PyTorch

投稿日: 2025年3月26日作成者: jarxiv

要約

CUDAグラフ – NVIDIA GPU用に導入された最近のハードウェア機能 – は、DAGとして一連のGPUタスク（カーネル）をキャプチャおよび起動することにより、CPU発射オーバーヘッドを削減することを目的としています。
ただし、CUDAグラフの展開は、グラフの静的構造により、今日いくつかの課題に直面しています。
また、データコピーのためにパフォーマンスオーバーヘッドも発生します。
実際、直感に反する結果を示します。多くの場合、CUDAグラフを展開することはパフォーマンスを傷つけます。
Pygraphは、Pytorch2内のCUDAグラフの力を自動的に活用する新しいアプローチを紹介します。
Pygraphは、3つの重要な観察結果によって駆動され、3つの新しい最適化を具体化します。これにより、CUDAグラフの幅広い展開が可能になり、GPUカーネルパラメーターのコピーオーバーヘッドが削減され、コストベネフィット分析に基づいてCUDAグラフを選択的に展開します。
Pygraphは、Pytorch2のコンパイルツールチェーンとシームレスに統合され、コードを手動で変更せずにCUDAグラフを効率的に使用できるようにします。
さまざまな機械学習ベンチマークにわたってPygraphを評価し、Pytorch2よりも大幅なパフォーマンスの改善を示しています。

要約(オリジナル)

CUDA Graphs — a recent hardware feature introduced for NVIDIA GPUs — aim to reduce CPU launch overhead by capturing and launching a series of GPU tasks (kernels) as a DAG. However, deploying CUDA Graphs faces several challenges today due to the static structure of a graph. It also incurs performance overhead due to data copy. In fact, we show a counter-intuitive result — deploying CUDA Graphs hurts performance in many cases. We introduce PyGraph, a novel approach to automatically harness the power of CUDA Graphs within PyTorch2. Driven by three key observations, PyGraph embodies three novel optimizations: it enables wider deployment of CUDA Graphs, reduces GPU kernel parameter copy overheads, and selectively deploys CUDA Graphs based on a cost-benefit analysis. PyGraph seamlessly integrates with PyTorch2’s compilation toolchain, enabling efficient use of CUDA Graphs without manual modifications to the code. We evaluate PyGraph across various machine learning benchmarks, demonstrating substantial performance improvements over PyTorch2.

arxiv情報

著者	Abhishek Ghosh,Ajay Nayak,Ashish Panwar,Arkaprava Basu
発行日	2025-03-25 15:47:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント