Publication date: Available online 30 March 2018
Source:Medical Image Analysis
Author(s): Daniel Wesierski, Anna Jezierska
Localizing instrument parts in video-assisted surgeries is an attractive and open computer vision problem. A working algorithm would immediately find applications in computer-aided interventions in the operating theater. Knowing the location of tool parts could help virtually augment visual faculty of surgeons, assess skills of novice surgeons, and increase autonomy of surgical robots. A surgical tool varies in appearance due to articulation, viewpoint changes, and noise. We introduce a new method for detection and pose estimation of multiple non-rigid and robotic tools in surgical videos. The method uses a rigidly structured, bipartite model of end-effector and shaft parts that consistently encode diverse, pose-specific appearance mixtures of the tool. This rigid part mixtures model then jointly explains the evolving tool structure by switching between mixture components. Rigidly capturing end-effector appearance allows explicit transfer of keypoint meta-data of the detected components for full 2D pose estimation. The detector can as well delineate precise skeleton of the end-effector by transferring additional keypoints. To this end, we propose effective procedure for learning such rigid mixtures from videos and for pooling the modeled shaft part that undergoes frequent truncation at the border of the imaged scene. Notably, extensive diagnostic experiments inform that feature regularization is a key to fine-tune the model in the presence of inherent appearance bias in videos. Experiments further illustrate that estimation of end-effector pose improves upon including the shaft part in the model. We then evaluate our approach on publicly available datasets of in-vivo sequences of non-rigid tools and demonstrate state-of-the-art results.
Graphical abstract
https://ift.tt/2GqoY5I