Towards Surgical Context Inference and Translation to Gestures

要約

ロボット支援手術におけるジェスチャの手動ラベル付けは、労力がかかり、エラーが発生しやすく、専門知識やトレーニングが必要です。
画像セグメンテーション用の豊富なデータを活用して、手術ツールとオブジェクトマスクを提供する手術シーンセグメンテーションモデルをトレーニングする、ジェスチャトランスクリプトの自動化された説明可能な生成方法を提案します。
ツールとオブジェクト間の距離と交点を調べることにより、セグメンテーションマスクを使用して外科的コンテキストが検出されます。
次に、知識ベースの有限状態マシン (FSM) とデータ駆動型の長期短期記憶 (LSTM) モデルを使用して、コンテキストラベルがジェスチャトランスクリプトに変換されます。
結果を JIGSAWS データセットのグラウンドトゥルースセグメンテーションマスク、コンセンサスコンテキストラベル、およびジェスチャラベルと比較することにより、メソッドの各段階のパフォーマンスを評価します。
私たちの結果は、セグメンテーションモデルが縫合の針と糸の認識で最先端のパフォーマンスを達成し、クラウドソースのラベル (例: 縫合の把持器とオブジェクト間の接触) と高度に一致して重要な手術状態を自動的に検出できることを示しています。
.
また、FSM モデルは、セグメンテーションとラベリングのパフォーマンスが低い場合でも、LSTM よりも堅牢であることがわかります。
提案された方法は、ジェスチャのラベル付けプロセスを大幅に短縮できます（〜2.8倍）。

要約(オリジナル)

Manual labeling of gestures in robot-assisted surgery is labor intensive, prone to errors, and requires expertise or training. We propose a method for automated and explainable generation of gesture transcripts that leverages the abundance of data for image segmentation to train a surgical scene segmentation model that provides surgical tool and object masks. Surgical context is detected using segmentation masks by examining the distances and intersections between the tools and objects. Next, context labels are translated into gesture transcripts using knowledge-based Finite State Machine (FSM) and data-driven Long Short Term Memory (LSTM) models. We evaluate the performance of each stage of our method by comparing the results with the ground truth segmentation masks, the consensus context labels, and the gesture labels in the JIGSAWS dataset. Our results show that our segmentation models achieve state-of-the-art performance in recognizing needle and thread in Suturing and we can automatically detect important surgical states with high agreement with crowd-sourced labels (e.g., contact between graspers and objects in Suturing). We also find that the FSM models are more robust to poor segmentation and labeling performance than LSTMs. Our proposed method can significantly shorten the gesture labeling process (~2.8 times).

arxiv情報

著者	Kay Hutchinson,Zongyu Li,Ian Reyes,Homa Alemzadeh
発行日	2023-02-28 01:39:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Surgical Context Inference and Translation to Gestures

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー