Automatic building footprint extraction from optical remote sensing (RS) images is popular and crucial for various downstream applications. Current building footprint extraction methods are mainly based on deep learning, which requires large amounts of manually labeled data for model training, limiting their practical deployment. Semi-supervised semantic segmentation (SSS), which leverages limited labeled data for supervised learning and abundant unlabeled data for unsupervised self-training (ST), offers a promising solution to reduce this reliance. Nonetheless, directly applying existing SSS methods to building footprint extraction with limited labels fails to fully exploit the geometric structural features of buildings—key characteristics that distinguish them from the background. To tackle this challenge, we propose a semi-supervised learning framework, HeightMatch, which integrates real or synthetic height information with RS images to extract more comprehensive and discriminative feature representations of buildings, particularly in limited-label scenarios. During training, these height maps effectively enhance the model’s ability to capture geometric structures, leading to more accurate pseudo-labels for unlabeled data and thereby enabling more effective ST. In inference, building predictions relies solely on RS images, ensuring the practicality of the proposed method. Extensive experimental results on five widely used building footprint extraction datasets demonstrate the effectiveness and superiority of our method in comparison with multiple state-of-the-art (SOTA) SSS methods.
article HSX+25
BibTeXKey: HSX+25