We propose a tightly-coupled semantic SLAM system SNI-SLAM++ to achieve dense semantic mapping and robust tracking. We introduce hierarchical semantic encoding for precisely constructing semantic maps. We integrate geometry, appearance, and semantic features based on cross-attention to enable mutual reinforcement between different features. We design an innovative semantics-coupled tracking framework that integrates semantic constraints into pose optimization.
In our experiments, we demonstrate that SNI-SLAM++ achieves superior performance compared with previous state-of-the-art methods across four datasets (Replica, ScanNet, TUM RGB-D, ScanNet++) in both semantic mapping and camera tracking.
@ARTICLE{zhu2025sni,
author={Zhu, Siting and Wang, Guangming and Blum, Hermann and Wang, Zhong and Zhang, Ganlin and Cremers, Daniel and Pollefeys, Marc and Wang, Hesheng},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={SNI-SLAM++: Tightly-Coupled Semantic Neural Implicit SLAM},
year={2026},
volume={48},
number={3},
pages={3399-3416}}