Simultaneous localization and mapping (SLAM) has achieved impressive performance in static environments. However, SLAM in dynamic environments remains an open question. Many methods directly filter out dynamic objects, resulting in incomplete scene reconstruction and limited accuracy of camera localization. The other works express dynamic objects by point clouds, sparse joints, or coarse meshes, which fails to provide a photo-realistic representation. To overcome the above limitations, we propose a photo-realistic and geometry-aware RGB-D SLAM method based on Gaussian splatting. Our method is composed of three main modules to 1) map the dynamic foreground including non-rigid humans/quadrupeds and rigid items, 2) reconstruct the static background, and 3) localize the camera. To map the foreground, we focus on modeling the deformations and/or motions. We consider the shape priors of humans/quadrupeds and exploit the geometric and appearance constraints of dynamic Gaussians. For background mapping, we design an optimization strategy between neighboring local maps by integrating appearance constraint into geometric alignment. As to camera localization, we leverage both static background and dynamic foreground to increase the number of observations and introduce more constraints. We explore the geometric and appearance constraints by associating 3D Gaussians with 2D optical flows and pixel patches. Experiments on extensive realworld datasets demonstrate that our method outperforms stateof-the-art approaches in terms of camera localization and scene mapping.
article
BibTeXKey: LMZ+25