V$^2$-SfMLearner: Learning Monocular Depth and Ego-motion for Multimodal Wireless Capsule Endoscopy

要約

深層学習は、カプセル内視鏡ビデオから深度マップとカプセルのエゴモーションを予測でき、3D シーンの再構成と病変の位置特定に役立ちます。
ただし、胃腸管内でのカプセル内視鏡の衝突により、トレーニングデータに振動の乱れが発生します。
既存のソリューションはビジョンベースの処理のみに焦点を当てており、ノイズを低減してパフォーマンスを向上させる可能性がある振動などの他の補助信号を無視しています。
したがって、我々は、単眼カプセル内視鏡検査のための視覚ベースの深さとカプセルの動きの推定に振動信号を統合するマルチモーダルなアプローチである V$^2$-SfMLearner を提案します。
私たちは、振動信号と視覚信号を含むマルチモーダルカプセル内視鏡データセットを構築し、人工知能ソリューションは視覚振動信号を使用した教師なし手法を開発し、マルチモーダル学習を通じて振動摂動を効果的に排除します。
具体的には、振動ノイズを検出して軽減するために、振動ネットワークブランチとフーリエ融合モジュールを慎重に設計しています。
フュージョンフレームワークは、一般的なビジョン専用アルゴリズムと互換性があります。
マルチモーダルデータセットに対する広範な検証により、ビジョンのみのアルゴリズムに対する優れたパフォーマンスと堅牢性が実証されました。
大規模な外部機器を必要とせず、当社の V$^2$-SfMLearner は臨床カプセルロボットに統合できる可能性があり、リアルタイムで信頼性の高い消化器検査ツールを提供します。
この研究結果は、医師の診断能力を向上させ、臨床現場での実用化が期待できることを示しています。

要約(オリジナル)

Deep learning can predict depth maps and capsule ego-motion from capsule endoscopy videos, aiding in 3D scene reconstruction and lesion localization. However, the collisions of the capsule endoscopies within the gastrointestinal tract cause vibration perturbations in the training data. Existing solutions focus solely on vision-based processing, neglecting other auxiliary signals like vibrations that could reduce noise and improve performance. Therefore, we propose V$^2$-SfMLearner, a multimodal approach integrating vibration signals into vision-based depth and capsule motion estimation for monocular capsule endoscopy. We construct a multimodal capsule endoscopy dataset containing vibration and visual signals, and our artificial intelligence solution develops an unsupervised method using vision-vibration signals, effectively eliminating vibration perturbations through multimodal learning. Specifically, we carefully design a vibration network branch and a Fourier fusion module, to detect and mitigate vibration noises. The fusion framework is compatible with popular vision-only algorithms. Extensive validation on the multimodal dataset demonstrates superior performance and robustness against vision-only algorithms. Without the need for large external equipment, our V$^2$-SfMLearner has the potential for integration into clinical capsule robots, providing real-time and dependable digestive examination tools. The findings show promise for practical implementation in clinical settings, enhancing the diagnostic capabilities of doctors.

arxiv情報

著者	Long Bai,Beilei Cui,Liangyu Wang,Yanheng Li,Shilong Yao,Sishen Yuan,Yanan Wu,Yang Zhang,Max Q. -H. Meng,Zhen Li,Weiping Ding,Hongliang Ren
発行日	2024-12-23 14:11:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

V$^2$-SfMLearner: Learning Monocular Depth and Ego-motion for Multimodal Wireless Capsule Endoscopy

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー