Video-based Surgical Tool-tip and Keypoint Tracking using Multi-frame Context-driven Deep Learning Models

要約

ロボット手術ビデオにおける手術ツールキーポイントの自動追跡は、スキル評価、専門知識評価、安全地帯の描写など、さまざまな下流のユースケースに不可欠な作業です。
近年、視力アプリケーションの深い学習の爆発により、手術器具のセグメンテーションで多くの作業が行われましたが、ツールのヒントなどの特定のツールキーポイントの追跡にあまり焦点が当てられていません。
この作業では、外科ビデオでツールキーポイントをローカライズして追跡するための新しいマルチフレームコンテキスト駆動型のディープラーニングフレームワークを提案します。
2015 Endovis Challenge Datasetの注釈付きフレームでモデルをトレーニングおよびテストし、最先端のパフォーマンスを発揮します。
洗練されたディープラーニングモデルとマルチフレームコンテキストを活用することにより、90 \％のキーポイント検出精度と5.27ピクセルのローカリゼーションRMSエラーを実現します。
より挑戦的なシナリオを備えた自己明細ジグソーデータセットの結果は、提案されたマルチフレームモデルがツールチップとツールベースのキーポイントを正確に追跡できることを示しています。
このようなフレームワークは、手術器具のキーポイントを正確に追跡し、さらに下流のユースケースを可能にする道を開きます。
プロジェクトおよびデータセットのWebページ：https：//tinyurl.com/mfc-tracker

要約(オリジナル)

Automated tracking of surgical tool keypoints in robotic surgery videos is an essential task for various downstream use cases such as skill assessment, expertise assessment, and the delineation of safety zones. In recent years, the explosion of deep learning for vision applications has led to many works in surgical instrument segmentation, while lesser focus has been on tracking specific tool keypoints, such as tool tips. In this work, we propose a novel, multi-frame context-driven deep learning framework to localize and track tool keypoints in surgical videos. We train and test our models on the annotated frames from the 2015 EndoVis Challenge dataset, resulting in state-of-the-art performance. By leveraging sophisticated deep learning models and multi-frame context, we achieve 90\% keypoint detection accuracy and a localization RMS error of 5.27 pixels. Results on a self-annotated JIGSAWS dataset with more challenging scenarios also show that the proposed multi-frame models can accurately track tool-tip and tool-base keypoints, with ${<}4.2$-pixel RMS error overall. Such a framework paves the way for accurately tracking surgical instrument keypoints, enabling further downstream use cases. Project and dataset webpage: https://tinyurl.com/mfc-tracker

arxiv情報

著者	Bhargav Ghanekar,Lianne R. Johnson,Jacob L. Laughlin,Marcia K. O’Malley,Ashok Veeraraghavan
発行日	2025-01-30 14:06:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Video-based Surgical Tool-tip and Keypoint Tracking using Multi-frame Context-driven Deep Learning Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー