Reconstructing endoscopic scenes is vital for medical purposes, such as post-operative assessments and educational training. Recently, neural rendering has emerged as a promising method for reconstructing endoscopic scenes involving tissue deformation. Yet, current techniques exhibit major limitations, such as reliance on static endoscopes, limited deformation, or the need for external tracking devices to obtain camera pose data. In this paper we introduce a novel solution that can tackle these challenges posed by<br>a moving stereo endoscope in a highly deformable setting. Our method divides the scene into multiple overlapping 4D neural radiance fields (NeRFs) and uses a progressive optimization approach via optical flow and geometry supervision for simultaneous reconstruction and camera pose estimation. Tested on videos of up to fifteen times longer than what prior work experiment on, our method greatly improves usability, extending detailed reconstruction to much longer surgical videos without external tracking. Comprehensive evaluations using the StereoMIS dataset show that our method substantially enhances novel view synthesis quality while maintaining competitive pose accuracy.
inproceedings
BibTeXKey: SKT+24a