Text Promptable Surgical Instrument Segmentation with Vision-Language Models

要約

この論文では、低侵襲手術における手術器具の多様性と差別化に伴う課題を克服するための、新しいテキストプロンプト表示可能な手術器具セグメンテーションアプローチを提案します。
私たちはタスクをテキストで指示できるものとして再定義し、それによって手術器具をより微妙に理解し、新しい種類の器具に適応できるようにします。
視覚言語モデルの最近の進歩に触発され、事前トレーニング済みの画像エンコーダーとテキストエンコーダーをモデルのバックボーンとして活用し、手術器具のセグメンテーション予測のための注意ベースと畳み込みベースのプロンプトスキームで構成されるテキストプロンプト可能なマスクデコーダーを設計します。
私たちのモデルは、プロンプトの新しい混合メカニズムを通じて各手術器具に複数のテキストプロンプトを活用し、その結果セグメンテーションパフォーマンスが向上します。
さらに、画像特徴の理解とセグメンテーションの精度を向上させるために、ハード機器領域強化モジュールを導入します。
いくつかの手術器具セグメンテーションデータセットに対する広範な実験により、私たちのモデルの優れたパフォーマンスと有望な一般化機能が実証されました。
私たちの知る限り、これは手術器具のセグメンテーションに対する迅速なアプローチの最初の実装であり、ロボット支援手術の分野での実用化に大きな可能性をもたらします。
コードは https://github.com/franciszzj/TP-SIS で入手できます。

要約(オリジナル)

In this paper, we propose a novel text promptable surgical instrument segmentation approach to overcome challenges associated with diversity and differentiation of surgical instruments in minimally invasive surgeries. We redefine the task as text promptable, thereby enabling a more nuanced comprehension of surgical instruments and adaptability to new instrument types. Inspired by recent advancements in vision-language models, we leverage pretrained image and text encoders as our model backbone and design a text promptable mask decoder consisting of attention- and convolution-based prompting schemes for surgical instrument segmentation prediction. Our model leverages multiple text prompts for each surgical instrument through a new mixture of prompts mechanism, resulting in enhanced segmentation performance. Additionally, we introduce a hard instrument area reinforcement module to improve image feature comprehension and segmentation precision. Extensive experiments on several surgical instrument segmentation datasets demonstrate our model’s superior performance and promising generalization capability. To our knowledge, this is the first implementation of a promptable approach to surgical instrument segmentation, offering significant potential for practical application in the field of robotic-assisted surgery. Code is available at https://github.com/franciszzj/TP-SIS.

arxiv情報

著者	Zijian Zhou,Oluwatosin Alabi,Meng Wei,Tom Vercauteren,Miaojing Shi
発行日	2023-11-08 15:36:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Text Promptable Surgical Instrument Segmentation with Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー