Semantic scene editing in the surgical domain presents unique challenges due to the need to preserve anatomical fidelity while altering specific scene elements. In this paper, we propose a novel image editing framework for cholecystectomy surgery using diffusion models. Our approach enables targeted modifications of surgical scenes—such as tool removal, relocation, rotation, and replacement—while maintaining a coherent representation of the operative field. By leveraging the conditional control capabilities of the diffusion model, our model semantically understands the surgical context and performs realistic inpainting to generate high-fidelity edited images. Our comprehensive quantitative and qualitative evaluations on the Cholec dataset demonstrate the proposed model’s superiority and effectiveness in preserving structural details and ensuring visual consistency in the scene editing task.
inproceedings KYN+25
BibTeXKey: KYN+25