jarxiv | Japanese arxiv | ページ 972

Comparing Next-Day Wildfire Predictability of MODIS and VIIRS Satellite Data

投稿日: 2025年4月11日作成者: jarxiv

要約

複数の研究が衛星画像を使用して翌日の火災予測を実施しています。
2つの主要な衛星は、山火事を検出するために使用されます：ModisとViirs。
両方の衛星は、それぞれMOD14とVNP14と呼ばれるファイヤーマスク製品を提供します。
研究はどちらか一方を使用していますが、それらの間に比較はありませんでした。
この論文では、まず、VIIRSとMODISデータを使用して、1日先に山火事が広がると予測できるかを評価します。
VIIRSを入力として、VNP14をターゲットとして使用するモデルが最良の結果を達成することがわかります。
興味深いことに、MODISを入力として、VNP14をターゲットとして使用するモデルは、VNP14を入力として、MOD14をターゲットとして使用するよりも大幅に優れたパフォーマンスを発揮します。
次に、MOD14が翌日の火災を予測するのに使用するのが難しい理由について説明します。
MOD14ファイヤーマスクは非常に確率的であり、合理的な火災散布パターンと相関していないことがわかります。
これは、モデルが不合理なパターンを学習するため、機械学習タスクにとって有害です。
したがって、MOD14は翌日の火災予測には適さないこと、およびVNP14がはるかに優れた選択肢であると結論付けます。
ただし、MODIS入力とVNP14をターゲットとして使用すると、予測可能性が大幅に改善されます。
これは、MODISでは改善された火災検出モデルが可能であることを示しています。
完全なコードとデータセットは、オンラインで入手できます：https：//github.com/justuskarlsson/wildfire-mod14-vnp14

要約(オリジナル)

Multiple studies have performed next-day fire prediction using satellite imagery. Two main satellites are used to detect wildfires: MODIS and VIIRS. Both satellites provide fire mask products, called MOD14 and VNP14, respectively. Studies have used one or the other, but there has been no comparison between them to determine which might be more suitable for next-day fire prediction. In this paper, we first evaluate how well VIIRS and MODIS data can be used to forecast wildfire spread one day ahead. We find that the model using VIIRS as input and VNP14 as target achieves the best results. Interestingly, the model using MODIS as input and VNP14 as target performs significantly better than using VNP14 as input and MOD14 as target. Next, we discuss why MOD14 might be harder to use for predicting next-day fires. We find that the MOD14 fire mask is highly stochastic and does not correlate with reasonable fire spread patterns. This is detrimental for machine learning tasks, as the model learns irrational patterns. Therefore, we conclude that MOD14 is unsuitable for next-day fire prediction and that VNP14 is a much better option. However, using MODIS input and VNP14 as target, we achieve a significant improvement in predictability. This indicates that an improved fire detection model is possible for MODIS. The full code and dataset is available online: https://github.com/justuskarlsson/wildfire-mod14-vnp14

arxiv情報

著者	Justus Karlsson,Yonghao Xu,Amanda Berg,Leif Haglund
発行日	2025-04-10 15:03:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

HarmonySeg: Tubular Structure Segmentation with Deep-Shallow Feature Fusion and Growth-Suppression Balanced Loss

投稿日: 2025年4月11日作成者: jarxiv

要約

容器や気道の木などの医療画像の管状構造の正確なセグメンテーションは、コンピューター支援診断、放射線療法、および外科的計画に重要です。
ただし、多様なサイズ、複雑なトポロジー、およびこれらの構造の（多くの場合）不完全なデータアノテーションに直面した場合、アルゴリズム設計には重要な課題が存在します。
HarmonySegという名前の新しい管状構造セグメンテーションフレームワークを提案することにより、これらの困難に対処します。
まず、さまざまな受容フィールドを備えた柔軟な畳み込みブロックを備えた深いシャロウのデコーダーネットワークを設計し、モデルが異なるスケールの管状構造に効果的に適応できるようにします。
第二に、潜在的な解剖学的領域を強調し、小さな管状構造のリコールを改善するために、血管マップを補助情報として組み込みます。
これらのマップは、浅くて深い融合モジュールを介して画像機能と整合しており、同時に不当な精度を維持するために不当な候補を排除します。
最後に、コンテキストと形状のプライアーを活用して管状構造の成長と抑制のバランスをとるトポロジを含む損失関数を導入します。これにより、モデルは低品質で不完全な注釈を処理できます。
4つのパブリックデータセットで広範な定量的実験が行われます。
結果は、私たちのモデルが2Dおよび3D管状構造を正確にセグメント化し、既存の最先端の方法を上回ることができることを示しています。
プライベートデータセットの外部検証も、良好な一般化可能性を示しています。

要約(オリジナル)

Accurate segmentation of tubular structures in medical images, such as vessels and airway trees, is crucial for computer-aided diagnosis, radiotherapy, and surgical planning. However, significant challenges exist in algorithm design when faced with diverse sizes, complex topologies, and (often) incomplete data annotation of these structures. We address these difficulties by proposing a new tubular structure segmentation framework named HarmonySeg. First, we design a deep-to-shallow decoder network featuring flexible convolution blocks with varying receptive fields, which enables the model to effectively adapt to tubular structures of different scales. Second, to highlight potential anatomical regions and improve the recall of small tubular structures, we incorporate vesselness maps as auxiliary information. These maps are aligned with image features through a shallow-and-deep fusion module, which simultaneously eliminates unreasonable candidates to maintain high precision. Finally, we introduce a topology-preserving loss function that leverages contextual and shape priors to balance the growth and suppression of tubular structures, which also allows the model to handle low-quality and incomplete annotations. Extensive quantitative experiments are conducted on four public datasets. The results show that our model can accurately segment 2D and 3D tubular structures and outperform existing state-of-the-art methods. External validation on a private dataset also demonstrates good generalizability.

arxiv情報

著者	Yi Huang,Ke Zhang,Wei Liu,Yuanyuan Wang,Vishal M. Patel,Le Lu,Xu Han,Dakai Jin,Ke Yan
発行日	2025-04-10 15:04:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations

投稿日: 2025年4月11日作成者: jarxiv

要約

Visual Grounding（VG）は、自然言語の説明に基づいて画像にターゲットオブジェクトをローカライズすることを目的としています。
この論文では、航空ビューからの視覚的接地に焦点を当てた新しいタスクであるAerialVGを提案します。
従来のVGと比較して、aerialVGは新しい課題を提起します\ emph {emg。}は、視覚的に類似した複数のオブジェクトを区別するには外観ベースの接地が不十分であり、位置関係を強調する必要があります。
その上、既存のVGモデルは、高解像度の画像が重大な困難を引き起こす航空画像に適用されると闘います。
これらの課題に対処するために、5Kの実世界の空中画像、50K手動注釈付きの説明、および103Kオブジェクトで構成される最初のaerialVGデータセットを紹介します。
特に、aerialVGデータセットの各アノテーションには、相対的な空間関係が注釈された複数のターゲットオブジェクトが含まれており、包括的な空間推論を実行するためにモデルが必要です。
さらに、特にAerialVGタスクの革新的なモデルを提案します。そこでは、ターゲット領域に焦点を合わせるために階層的な交差出席が考案され、関係認識の接地モジュールが位置関係を推測するように設計されています。
実験結果は、データセットとメソッドの有効性を検証し、空中視覚接地における空間推論の重要性を強調します。
コードとデータセットがリリースされます。

要約(オリジナル)

Visual grounding (VG) aims to localize target objects in an image based on natural language descriptions. In this paper, we propose AerialVG, a new task focusing on visual grounding from aerial views. Compared to traditional VG, AerialVG poses new challenges, \emph{e.g.}, appearance-based grounding is insufficient to distinguish among multiple visually similar objects, and positional relations should be emphasized. Besides, existing VG models struggle when applied to aerial imagery, where high-resolution images cause significant difficulties. To address these challenges, we introduce the first AerialVG dataset, consisting of 5K real-world aerial images, 50K manually annotated descriptions, and 103K objects. Particularly, each annotation in AerialVG dataset contains multiple target objects annotated with relative spatial relations, requiring models to perform comprehensive spatial reasoning. Furthermore, we propose an innovative model especially for the AerialVG task, where a Hierarchical Cross-Attention is devised to focus on target regions, and a Relation-Aware Grounding module is designed to infer positional relations. Experimental results validate the effectiveness of our dataset and method, highlighting the importance of spatial reasoning in aerial visual grounding. The code and dataset will be released.

arxiv情報

著者	Junli Liu,Qizhi Chen,Zhigang Wang,Yiwen Tang,Yiting Zhang,Chi Yan,Dong Wang,Xuelong Li,Bin Zhao
発行日	2025-04-10 15:13:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Soybean Disease Detection via Interpretable Hybrid CNN-GNN: Integrating MobileNetV2 and GraphSAGE with Cross-Modal Attention

投稿日: 2025年4月11日作成者: jarxiv

要約

大豆の葉疾患の検出は、農業の生産性にとって重要ですが、視覚的に類似した症状と従来の方法での解釈が制限されているため、課題に直面しています。
畳み込みニューラルネットワーク（CNN）は空間的特徴抽出に優れていますが、多くの場合、誤分類につながるイメージ間の関係依存関係を無視します。
このホワイトペーパーでは、リレーショナルモデリングのために局所的な特徴抽出とグラフセージのためにMobileNETV2を相乗化する解釈可能なハイブリッドシーケンシャルCNN-グラフニューラルネットワーク（GNN）フレームワークを提案します。
フレームワークは、ノードがリーフ画像を表すグラフを構築し、エッジはコサインの類似性ベースの隣接マトリックスと適応型近傍サンプリングによって定義されます。
この設計は、細粒の病変の特徴と世界的な症状パターンを捉え、クラス間の類似性の課題に対処します。
クロスモーダルの解釈は、グレードカムおよび固有カムの視覚化を介して達成され、ヒートマップを生成して疾患の影響領域を強調します。
10個の大豆葉疾患のデータセットで評価されたこのモデルは、$ 97.16 \％$の精度を達成し、スタンドアロンCNN（$ \ le95.04 \％$）および従来の機械学習モデル（$ \ le77.05 \％$）を上回ります。
アブレーション研究は、並列またはシングルモデルの構成よりもシーケンシャルアーキテクチャの優位性を検証します。
わずか230万のパラメーターを備えた軽量MobileNetv2-Graphsageの組み合わせにより、計算効率が保証され、リソースが制約された環境でリアルタイムの展開が可能になります。
提案されたアプローチは、正確な分類と実用的な適用性との間のギャップを埋め、植物病理研究におけるCNN-GNN統合を進めながら、農業診断のための堅牢で解釈可能なツールを提供します。

要約(オリジナル)

Soybean leaf disease detection is critical for agricultural productivity but faces challenges due to visually similar symptoms and limited interpretability in conventional methods. While Convolutional Neural Networks (CNNs) excel in spatial feature extraction, they often neglect inter-image relational dependencies, leading to misclassifications. This paper proposes an interpretable hybrid Sequential CNN-Graph Neural Network (GNN) framework that synergizes MobileNetV2 for localized feature extraction and GraphSAGE for relational modeling. The framework constructs a graph where nodes represent leaf images, with edges defined by cosine similarity-based adjacency matrices and adaptive neighborhood sampling. This design captures fine-grained lesion features and global symptom patterns, addressing inter-class similarity challenges. Cross-modal interpretability is achieved via Grad-CAM and Eigen-CAM visualizations, generating heatmaps to highlight disease-influential regions. Evaluated on a dataset of ten soybean leaf diseases, the model achieves $97.16\%$ accuracy, surpassing standalone CNNs ($\le95.04\%$) and traditional machine learning models ($\le77.05\%$). Ablation studies validate the sequential architecture’s superiority over parallel or single-model configurations. With only 2.3 million parameters, the lightweight MobileNetV2-GraphSAGE combination ensures computational efficiency, enabling real-time deployment in resource-constrained environments. The proposed approach bridges the gap between accurate classification and practical applicability, offering a robust, interpretable tool for agricultural diagnostics while advancing CNN-GNN integration in plant pathology research.

arxiv情報

著者	Md Abrar Jahin,Soudeep Shahriar,M. F. Mridha,Md. Jakir Hossen,Nilanjan Dey
発行日	2025-04-10 15:14:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.LG | コメントを受け付けていません

V2V3D: View-to-View Denoised 3D Reconstruction for Light-Field Microscopy

投稿日: 2025年4月11日作成者: jarxiv

要約

ライトフィールド顕微鏡（LFM）は、スナップショットベースの大規模な3D蛍光画像をキャプチャする能力により、大きな注目を集めています。
ただし、既存のLFM再構成アルゴリズムは、センサーノイズに非常に敏感であるか、トレーニングのためにグラウンドトゥルースの注釈付きデータが必要です。
これらの課題に対処するために、このペーパーでは、統一されたアーキテクチャにおける画像除去と3D再構成の共同最適化のための新しいパラダイムを確立する監視されていないView2ViewベースのフレームワークであるV2V3Dを紹介します。
LF画像は一貫した3D信号から派生しており、各ビューのノイズは独立していると仮定します。
これにより、V2V3Dは効果的な除去のためにNoise2Noiseの原理を組み込むことができます。
高周波の詳細の回復を強化するために、波光学系の前方伝播に使用されるポイントスプレッド関数を変換し、特徴的なアライメント用に設計された畳み込みカーネルに変換する新しい波数光学ベースの特徴アライメント手法を提案します。
さらに、LF画像と対応する3D強度ボリュームを含むLFMデータセットを導入します。
広範な実験は、私たちのアプローチが高い計算効率を達成し、他の最先端の方法よりも優れていることを示しています。
これらの進歩は、V2V3Dを困難な条件下で3Dイメージングの有望なソリューションとして位置付けています。

要約(オリジナル)

Light field microscopy (LFM) has gained significant attention due to its ability to capture snapshot-based, large-scale 3D fluorescence images. However, existing LFM reconstruction algorithms are highly sensitive to sensor noise or require hard-to-get ground-truth annotated data for training. To address these challenges, this paper introduces V2V3D, an unsupervised view2view-based framework that establishes a new paradigm for joint optimization of image denoising and 3D reconstruction in a unified architecture. We assume that the LF images are derived from a consistent 3D signal, with the noise in each view being independent. This enables V2V3D to incorporate the principle of noise2noise for effective denoising. To enhance the recovery of high-frequency details, we propose a novel wave-optics-based feature alignment technique, which transforms the point spread function, used for forward propagation in wave optics, into convolution kernels specifically designed for feature alignment. Moreover, we introduce an LFM dataset containing LF images and their corresponding 3D intensity volumes. Extensive experiments demonstrate that our approach achieves high computational efficiency and outperforms the other state-of-the-art methods. These advancements position V2V3D as a promising solution for 3D imaging under challenging conditions.

arxiv情報

著者	Jiayin Zhao,Zhenqi Fu,Tao Yu,Hui Qiao
発行日	2025-04-10 15:29:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

SAMJAM: Zero-Shot Video Scene Graph Generation for Egocentric Kitchen Videos

投稿日: 2025年4月11日作成者: jarxiv

要約

ビデオシーングラフ生成（VIDSGG）は、動的なキッチン環境を理解する上で重要なトピックです。
VIDSGGの現在のモデルでは、シーングラフを作成するために広範なトレーニングが必要です。
最近、Vision Language Models（VLM）およびVision Foundation Models（VFM）は、さまざまなタスクで印象的なゼロショット機能を実証しています。
ただし、GeminiのようなVLMは、Vidsggのダイナミクスと格闘しており、フレーム全体で安定したオブジェクトのアイデンティティを維持できません。
この制限を克服するために、Sam2の時間的追跡とGeminiの意味的理解を組み合わせたゼロショットパイプラインであるSamjamを提案します。
SAM2は、より正確な境界ボックスを生成することにより、ジェミニのオブジェクトの接地を改善します。
この方法では、最初にジェミニにフレームレベルのシーングラフを生成するように促します。
次に、一致するアルゴリズムを使用して、SAM2で生成されたマスクまたはSAM2プロパゲーションのマスクでシーングラフの各オブジェクトをマッピングし、動的環境で一時的に無意味なシーングラフを作成します。
最後に、次の各フレームでこのプロセスを再度繰り返します。
Samjamは、Epic-KitchensおよびEpic-Kitchens-100データセットの平均リコールでGeminiを8.33％上回ることを経験的に実証しています。

要約(オリジナル)

Video Scene Graph Generation (VidSGG) is an important topic in understanding dynamic kitchen environments. Current models for VidSGG require extensive training to produce scene graphs. Recently, Vision Language Models (VLM) and Vision Foundation Models (VFM) have demonstrated impressive zero-shot capabilities in a variety of tasks. However, VLMs like Gemini struggle with the dynamics for VidSGG, failing to maintain stable object identities across frames. To overcome this limitation, we propose SAMJAM, a zero-shot pipeline that combines SAM2’s temporal tracking with Gemini’s semantic understanding. SAM2 also improves upon Gemini’s object grounding by producing more accurate bounding boxes. In our method, we first prompt Gemini to generate a frame-level scene graph. Then, we employ a matching algorithm to map each object in the scene graph with a SAM2-generated or SAM2-propagated mask, producing a temporally-consistent scene graph in dynamic environments. Finally, we repeat this process again in each of the following frames. We empirically demonstrate that SAMJAM outperforms Gemini by 8.33% in mean recall on the EPIC-KITCHENS and EPIC-KITCHENS-100 datasets.

arxiv情報

著者	Joshua Li,Fernando Jose Pena Cantu,Emily Yu,Alexander Wong,Yuchen Cui,Yuhao Chen
発行日	2025-04-10 15:43:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

Robust image representations with counterfactual contrastive learning

投稿日: 2025年4月11日作成者: jarxiv

要約

対照的な前orainingは、モデルの一般化と下流のパフォーマンスを大幅に増加させる可能性があります。
ただし、学習した表現の品質は、正のペアを生成するために適用されるデータ増強戦略に大きく依存しています。
肯定的な対照ペアは、データ収集ドメインに関連する不要なバリエーションを破棄しながら、意味的な意味を維持する必要があります。
従来のコントラストパイプラインは、事前に定義された一般的な画像変換を通じてドメインシフトをシミュレートしようとします。
ただし、これらは、スキャナーの違いなど、医療イメージングの現実的で関連するドメインのバリエーションを常に模倣するとは限りません。
この問題に取り組むために、ここでは、因果画像合成の最近の進歩を活用して、関連するドメインのバリエーションを忠実にキャプチャする対照的なポジティブペアを作成する新しいフレームワークである反事実的な対照学習を導入します。
2つの確立された対照的な目的（SIMCLRとDINO-V2）について、胸部X線撮影とマンモグラフィデータの両方を含む5つのデータセットで評価されたこの方法は、獲得シフトに対する堅牢性の観点から標準的な対照学習を上回ります。
特に、反事実的な対照学習は、特にトレーニングセットで過小評価されているスキャナーで取得された画像で、分散内データセットと外部データセットの両方で優れた下流のパフォーマンスを実現します。
さらなる実験は、提案されたフレームワークが獲得シフトを超えていることを示しており、モデルは反事実的な対照学習で訓練されており、生物学的性別全体のサブグループの格差を減らします。

要約(オリジナル)

Contrastive pretraining can substantially increase model generalisation and downstream performance. However, the quality of the learned representations is highly dependent on the data augmentation strategy applied to generate positive pairs. Positive contrastive pairs should preserve semantic meaning while discarding unwanted variations related to the data acquisition domain. Traditional contrastive pipelines attempt to simulate domain shifts through pre-defined generic image transformations. However, these do not always mimic realistic and relevant domain variations for medical imaging, such as scanner differences. To tackle this issue, we herein introduce counterfactual contrastive learning, a novel framework leveraging recent advances in causal image synthesis to create contrastive positive pairs that faithfully capture relevant domain variations. Our method, evaluated across five datasets encompassing both chest radiography and mammography data, for two established contrastive objectives (SimCLR and DINO-v2), outperforms standard contrastive learning in terms of robustness to acquisition shift. Notably, counterfactual contrastive learning achieves superior downstream performance on both in-distribution and external datasets, especially for images acquired with scanners under-represented in the training set. Further experiments show that the proposed framework extends beyond acquisition shifts, with models trained with counterfactual contrastive learning reducing subgroup disparities across biological sex.

arxiv情報

著者	Mélanie Roschewitz,Fabio De Sousa Ribeiro,Tian Xia,Galvin Khara,Ben Glocker
発行日	2025-04-10 16:19:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.CV | コメントを受け付けていません

Multi-view Hybrid Graph Convolutional Network for Volume-to-mesh Reconstruction in Cardiovascular MRI

投稿日: 2025年4月11日作成者: jarxiv

要約

心血管磁気共鳴画像法は、心臓の形態と機能を調べるための重要なツールとして浮上しています。
この努力に不可欠なのは、CMR画像に由来する解剖学的3D表面と体積メッシュであり、計算解剖学の研究、バイオマーカーの発見、およびシリコ内シミュレーションを促進します。
従来のアプローチは通常、複雑なマルチステップパイプラインに従い、最初に画像をセグメント化してからメッシュを再構築して、時間がかかり、エラー伝播を起こしやすくします。
これに応じて、標準の畳み込み式ニューラルネットワークをグラフ畳み込みとシームレスに統合する直接画像間抽出のための新しいアーキテクチャであるHybridVnetを紹介します。
精度をさらに向上させるために、長軸と短軸CMRの両方を処理するマルチビューハイブリッドヴネットアーキテクチャを提案し、心臓MRメッシュの生成の性能を向上させることができることを示します。
私たちのモデルでは、従来の畳み込みネットワークと変分グラフ生成モデル、深い監督、メッシュ固有の正則化を組み合わせています。
英国のバイオバンクの包括的なデータセットでの実験は、CMR画像から高忠実度のメッシュを効率的に生成することにより、ハイブリッドヴネットの可能性を確認します。
Multi-View HybridVnetは、最先端の輪郭距離（LV心筋の場合は1.86 mmから1.35 mm）、最大$ \ SIM $ 18 \％の改善）の平均輪郭距離の最大$ 27 \％の削減を達成し、最大$ 27 \％の削減を達成します。
サイコロ係数（LV心筋の場合は0.78から0.84）の$ \ sim $ 8 \％）は、その優れた精度を強調しています。

要約(オリジナル)

Cardiovascular magnetic resonance imaging is emerging as a crucial tool to examine cardiac morphology and function. Essential to this endeavour are anatomical 3D surface and volumetric meshes derived from CMR images, which facilitate computational anatomy studies, biomarker discovery, and in-silico simulations. Traditional approaches typically follow complex multi-step pipelines, first segmenting images and then reconstructing meshes, making them time-consuming and prone to error propagation. In response, we introduce HybridVNet, a novel architecture for direct image-to-mesh extraction seamlessly integrating standard convolutional neural networks with graph convolutions, which we prove can efficiently handle surface and volumetric meshes by encoding them as graph structures. To further enhance accuracy, we propose a multi-view HybridVNet architecture which processes both long axis and short axis CMR, showing that it can increase the performance of cardiac MR mesh generation. Our model combines traditional convolutional networks with variational graph generative models, deep supervision and mesh-specific regularisation. Experiments on a comprehensive dataset from the UK Biobank confirm the potential of HybridVNet to significantly advance cardiac imaging and computational cardiology by efficiently generating high-fidelity meshes from CMR images. Multi-view HybridVNet outperforms the state-of-the-art, achieving improvements of up to $\sim$27\% reduction in Mean Contour Distance (from 1.86 mm to 1.35 mm for the LV Myocardium), up to $\sim$18\% improvement in Hausdorff distance (from 4.74 mm to 3.89mm, for the LV Endocardium), and up to $\sim$8\% in Dice Coefficient (from 0.78 to 0.84, for the LV Myocardium), highlighting its superior accuracy.

arxiv情報

著者	Nicolás Gaggion,Benjamin A. Matheson,Yan Xia,Rodrigo Bonazzola,Nishant Ravikumar,Zeike A. Taylor,Diego H. Milone,Alejandro F. Frangi,Enzo Ferrante
発行日	2025-04-10 16:25:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, eess.IV | コメントを受け付けていません

The Efficacy of Semantics-Preserving Transformations in Self-Supervised Learning for Medical Ultrasound

投稿日: 2025年4月11日作成者: jarxiv

要約

データ増強は、共同埋め込み自己監視学習（SSL）の中心的な要素です。
自然な画像に機能するアプローチは、医療画像タスクで常に効果的ではない場合があります。
この研究では、肺超音波のSSLにおけるデータ増強および前処理戦略の影響を体系的に調査しました。
3つのデータ増強パイプラインが評価されました。（1）イメージングドメイン間で一般的に使用されるベースラインパイプライン、（2）超音波用に設計された新しいセマンティックプレゼンティングパイプライン、および（3）両方のパイプラインからの最も効果的な変換の蒸留セット。
事前に保護されたモデルは、B-Line検出、胸水検出、およびCOVID-19分類の複数の分類タスクで評価されました。
実験により、セマンティクスを提供するデータ増強により、Covid-19分類のパフォーマンスが最も大きいことが明らかになりました。これは、グローバルな画像コンテキストを必要とする診断タスクです。
作物ベースの方法は、Bラインおよび胸水オブジェクト分類タスクで最大のパフォーマンスをもたらしました。
最後に、セマンティクスを摂取する超音波画像前処理により、複数のタスクの下流パフォーマンスが向上しました。
データ増強戦略と前処理戦略に関するガイダンスは、超音波でSSLを扱う実務家向けに合成されました。

要約(オリジナル)

Data augmentation is a central component of joint embedding self-supervised learning (SSL). Approaches that work for natural images may not always be effective in medical imaging tasks. This study systematically investigated the impact of data augmentation and preprocessing strategies in SSL for lung ultrasound. Three data augmentation pipelines were assessed: (1) a baseline pipeline commonly used across imaging domains, (2) a novel semantic-preserving pipeline designed for ultrasound, and (3) a distilled set of the most effective transformations from both pipelines. Pretrained models were evaluated on multiple classification tasks: B-line detection, pleural effusion detection, and COVID-19 classification. Experiments revealed that semantics-preserving data augmentation resulted in the greatest performance for COVID-19 classification – a diagnostic task requiring global image context. Cropping-based methods yielded the greatest performance on the B-line and pleural effusion object classification tasks, which require strong local pattern recognition. Lastly, semantics-preserving ultrasound image preprocessing resulted in increased downstream performance for multiple tasks. Guidance regarding data augmentation and preprocessing strategies was synthesized for practitioners working with SSL in ultrasound.

arxiv情報

著者	Blake VanBerlo,Alexander Wong,Jesse Hoey,Robert Arntfield
発行日	2025-04-10 16:26:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV, cs.LG, eess.IV, I.2.10 | コメントを受け付けていません

Next-generation Surgical Navigation: Marker-less Multi-view 6DoF Pose Estimation of Surgical Instruments

投稿日: 2025年4月11日作成者: jarxiv

要約

従来のコンピュータービジョンに関する最先端の研究は、手術領域でますます活用されています。
コンピューター支援の手術に特に焦点を当てているのは、深い学習方法を使用して、機器のローカリゼーションのためのマーカーベースの追跡システムを純粋な画像ベースの6DOFポーズ推定に置き換えることです。
ただし、最先端のシングルビューのポーズ推定方法は、外科的航法に必要な精度をまだ満たしていません。
これに関連して、手術器具の非常に正確でオクルージョン – ロビー6DOFポーズ推定のためのマルチビューセットアップの利点を調査し、手術室の課題に対処する理想的なカメラシステムの推奨事項を導き出します。
この作業の貢献は3つあります。
まず、静的カメラとヘッドマウントカメラで構成されるマルチカメラキャプチャセットアップを紹介します。これにより、さまざまなカメラ構成の下でポーズ推定方法のパフォーマンスを調べることができます。
第二に、私たちは、手術湿性ラボと本物の手術劇場で撮影された、視線脊椎手術のマルチビューRGB-Dビデオデータセットを公開し、外科医、機器、患者の解剖学のための豊富な注釈を含みます。
第三に、手術器具の6DOFポーズ推定のタスクのための3つの最先端のシングルビューおよびマルチビューメソッドを評価し、ポーズの精度と一般化能力に対するカメラ構成、トレーニングデータ、および閉塞の影響を分析します。
最良の方法は、マルチビューポーズ最適化で5つのカメラを利用し、外科的訓練では1.01 mmおよび0.89 {\ deg}の平均位置と方向誤差を達成し、最適な条件下でドライバーに2.79 mmおよび3.33 {\ deg}を達成します。
我々の結果は、手術器具のマーカーレス追跡が既存のマーカーベースのシステムに代わる実現可能な代替手段になりつつあることを示しています。

要約(オリジナル)

State-of-the-art research of traditional computer vision is increasingly leveraged in the surgical domain. A particular focus in computer-assisted surgery is to replace marker-based tracking systems for instrument localization with pure image-based 6DoF pose estimation using deep-learning methods. However, state-of-the-art single-view pose estimation methods do not yet meet the accuracy required for surgical navigation. In this context, we investigate the benefits of multi-view setups for highly accurate and occlusion-robust 6DoF pose estimation of surgical instruments and derive recommendations for an ideal camera system that addresses the challenges in the operating room. The contributions of this work are threefold. First, we present a multi-camera capture setup consisting of static and head-mounted cameras, which allows us to study the performance of pose estimation methods under various camera configurations. Second, we publish a multi-view RGB-D video dataset of ex-vivo spine surgeries, captured in a surgical wet lab and a real operating theatre and including rich annotations for surgeon, instrument, and patient anatomy. Third, we evaluate three state-of-the-art single-view and multi-view methods for the task of 6DoF pose estimation of surgical instruments and analyze the influence of camera configurations, training data, and occlusions on the pose accuracy and generalization ability. The best method utilizes five cameras in a multi-view pose optimization and achieves an average position and orientation error of 1.01 mm and 0.89{\deg} for a surgical drill as well as 2.79 mm and 3.33{\deg} for a screwdriver under optimal conditions. Our results demonstrate that marker-less tracking of surgical instruments is becoming a feasible alternative to existing marker-based systems.

arxiv情報

著者	Jonas Hein,Nicola Cavalcanti,Daniel Suter,Lukas Zingg,Fabio Carrillo,Lilian Calvet,Mazda Farshad,Marc Pollefeys,Nassir Navab,Philipp Fürnstahl
発行日	2025-04-10 17:23:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.CV | コメントを受け付けていません

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント