SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation

要約

最近の Segment Anything Model (SAM) 2 は、セマンティックセグメンテーションにおける顕著な基礎能力を実証しており、そのメモリメカニズムとマスクデコーダはビデオトラッキングとオブジェクトオクルージョンの課題にさらに対処し、それによって画像とビデオの両方のインタラクティブセグメンテーションで優れた結果を達成しています。
これまでの実証研究に基づいて、プロンプトに基づくロボット支援手術における SAM 2 のゼロショットセグメンテーションのパフォーマンスと、現実世界の破損に対する堅牢性をさらに調査します。
静止画像の場合は、1 ポイントとバウンディングボックスの 2 つの形式のプロンプトを使用しますが、ビデオシーケンスの場合は、1 ポイントプロンプトが最初のフレームに適用されます。
MICCAI EndoVis 2017 および EndoVis 2018 ベンチマークでの広範な実験を通じて、バウンディングボックスプロンプトを利用する場合、SAM 2 は比較評価において最先端 (SOTA) 手法を上回りました。
ポイントプロンプトを使用した結果も、SAM の機能が大幅に強化されており、既存のプロンプトなし SOTA 手法に近い、またはそれを超えています。
さらに、SAM 2 は、推論速度が向上し、さまざまな画像破損に対するパフォーマンスの低下が少ないことを示しています。
特定のエッジまたは領域ではわずかに満足のいかない結果が残りますが、1 点プロンプトに対する SAM 2 の堅牢な適応性は、プロンプト要件が限られている下流の外科タスクに対する SAM 2 の可能性を強調しています。

要約(オリジナル)

The recent Segment Anything Model (SAM) 2 has demonstrated remarkable foundational competence in semantic segmentation, with its memory mechanism and mask decoder further addressing challenges in video tracking and object occlusion, thereby achieving superior results in interactive segmentation for both images and videos. Building upon our previous empirical studies, we further explore the zero-shot segmentation performance of SAM 2 in robot-assisted surgery based on prompts, alongside its robustness against real-world corruption. For static images, we employ two forms of prompts: 1-point and bounding box, while for video sequences, the 1-point prompt is applied to the initial frame. Through extensive experimentation on the MICCAI EndoVis 2017 and EndoVis 2018 benchmarks, SAM 2, when utilizing bounding box prompts, outperforms state-of-the-art (SOTA) methods in comparative evaluations. The results with point prompts also exhibit a substantial enhancement over SAM’s capabilities, nearing or even surpassing existing unprompted SOTA methodologies. Besides, SAM 2 demonstrates improved inference speed and less performance degradation against various image corruption. Although slightly unsatisfactory results remain in specific edges or regions, SAM 2’s robust adaptability to 1-point prompts underscores its potential for downstream surgical tasks with limited prompt requirements.

arxiv情報

著者	Jieming Yu,An Wang,Wenzhen Dong,Mengya Xu,Mobarakol Islam,Jie Wang,Long Bai,Hongliang Ren
発行日	2024-08-08 17:08:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー