Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Agenda Overview |
| Session | ||
WG III/4A: Landuse and Landcover Change Detection
Session Topics: Landuse and Landcover Change Detection (WG III/4)
| ||
| External Resource: http://www.commission3.isprs.org/wg4 | ||
| Presentations | ||
3:30pm - 3:45pm
ChangeDINO: DINOv3-Driven Building Change Detection in Optical Remote Sensing Imagery 1National Cheng Kung University, Tainan, Taiwan; 2National Yang Ming Chiao Tung University, Hsinchu, Taiwan Remote sensing change detection (RSCD) aims to identify pixel-wise surface changes from co-registered bi-temporal images. However, many deep learning–based RSCD methods rely solely on change-map annotations and underuse the semantic information in non-changing regions, which limits robustness under illumination variation, off-nadir views, and scarce labels. This paper presents ChangeDINO, an end-to-end multiscale Siamese framework for optical building change detection. The model fuses a lightweight backbone stream with features transferred from a frozen DINOv3, yielding semantic- and context-rich pyramids even on small datasets. A spatial–spectral differential transformer decoder then exploits multi-scale absolute differences as change priors to highlight true building changes and suppress irrelevant responses. Finally, a learnable morphology module refines the upsampled logits to recover clean boundaries. Experiments on four public benchmarks demonstrate that ChangeDINO achieves strong accuracy and robustness under cross-temporal appearance variations, yielding cleaner building boundaries with improved data efficiency. 3:45pm - 4:00pm
Hie-DinoMamba: Hierarchical DINOv3 and Mamba Architecture for Multi-Class Building Change Detection 1Geospatial Team, Innopam, Seoul, Republic of Korea; 2Department of Geoinformatics, University of Seoul, Seoul, Republic of Korea Multi-class building change detection in high-resolution aerial imagery is essential for urban monitoring, yet remains challenging due to severe class imbalance and the limited representational capacity of encoders trained from scratch. We propose Hie-DinoMamba, a novel architecture that integrates a frozen 1.1B-parameter DINOv3-L encoder—pre-trained on the SAT-493M satellite dataset—with a newly designed Hierarchical Mamba FPN decoder. To bridge the domain gap between satellite pre-training and aerial imagery without incurring prohibitive computational costs, we adapt the encoder using parameter-efficient Low-Rank Adaptation (LoRA), updating only a small fraction of parameters while preserving the encoder's rich pre-trained knowledge. The decoder fuses multi-scale feature pairs from both time points via channel-wise concatenation and 1×1 projection, then refines them in a top-down manner using Visual State Space Model (VSSM) blocks that capture long-range spatial context with linear complexity. A dual-loss strategy decouples semantic classification (Focal Loss) from boundary delineation (Focal Tversky + Dice Loss), optimizing each objective at a different hierarchical level. On a 4-class aerial building change detection benchmark (41,548 image pairs, 0.1 m resolution, Seoul), Hie-DinoMamba achieves a state-of-the-art mIoU of 65.12% and Kappa of 75.77%, improving over the strongest baseline by 2.1 percentage points. An ablation study confirms that LoRA adaptation is the most critical component. Qualitative analysis further demonstrates robust generalization to geographically unseen regions. 4:00pm - 4:15pm
Stepwise Optimization and Ensemble Pipeline for Building Change Detection in High Resolution Satellite Imagery Using Mamba-Based Model 1Department of Data Engineering, Pukyong National University, Busan, Republic of Korea; 2Division of Data Information Sciences, Pukyong National University, Busan, Republic of Korea This study presents a stepwise optimization pipeline for high-resolution building change detection in dense urban environments using imagery from CAS500-1, Korea’s national land observation satellite. A dataset of 3,816 bi-temporal patch pairs from 29 urban regions was constructed to support model development and evaluation. A Mamba-based architecture, incorporating efficient global context modeling, was adopted as the baseline for binary change detection. To enhance performance, the pipeline introduced three sequential optimization stages. First, normalization techniques suited for 12-bit radiometric imagery were compared, including percentile-based scaling, gamma adjustment, and logarithmic transformation. Second, augmentation strategies were evaluated, contrasting standard geometric augmentation with extended optical and temporal augmentation designed to improve generalization in structurally complex urban environments. Third, multiple ensemble strategies, ranging from simple averaging to confidence-weighted and hierarchical aggregation, were examined to overcome the limitations of individual model sizes. Model performance was assessed using a comprehensive set of pixel-level, change-pixel-level, contour-based, and object-based metrics to ensure robust evaluation of both spatial precision and structural consistency. Experimental results showed that gamma-based normalization, comprehensive augmentation, and selected ensemble strategies each contributed measurable improvements. Combining these optimized components yielded a final hierarchical ensemble that improved the F1-Score from 0.7629 to 0.8070, representing a substantial gain over the baseline model. Overall, this work provides a validated and extensible optimization strategy for high-resolution satellite-based change detection, offering practical guidance for operational applications and adaptability to future ensemble configurations across diverse architectures. 4:15pm - 4:30pm
Leveraging Geospatial Foundation Models for Bi-Temporal Land-Cover Change Detection Canada Centre for Mapping and Earth Observation, Natural Resources Canada, Canada Recent advances in geospatial foundation models have enabled scalable and transferable solutions for Earth observation (EO) tasks, which can make them good candidates to achieve the requirements mentioned above. Foundation models are types of large-scale artificial intelligence (AI) models trained on massive and diverse datasets. In the EO domain, these datasets may include imagery, elevation models, geographic coordinates, temporal tags, sensors spectral information, and descriptive metadata. These models excel at representation learning through self-supervised training, enabling them to capture rich descriptive features without requiring labelled data. Consequently, they can serve as powerful backbones for downstream tasks such as land-cover change monitoring. Accordingly, this paper provides an overview of the development process of a geospatial foundation model, Planaura. It demonstrates how this model is best adapted to Canadian landscapes and how it is used to achieve the task of land-cover change detection. Planaura is now accessible publicly via the model hub at HuggingFace: [Link hidden for blind review process] 4:30pm - 4:45pm
A Transformer-Based Framework for Spatiotemporal Unmixing of Land–Water Mixtures in Multispectral Satellite Data 1KU Leuven, Leuven, Belgium; 2Karlsruhe Institute of Technology, Karlsruhe, Germany This paper presents a novel transformer-based framework for spatiotemporally dynamic spectral unmixing of multispectral satellite imagery. Spectral unmixing is essential for analyzing mixed pixels in remote sensing, especially in analyzing small objects such as narrow rivers when using coarse-resolution observations such as Sentinel-2 data. Most deep-learning based unmixing models typically account for a single scene and ignore the tempo-spatial variation of spectra and land-cover proportions. To address this challenge, we introduce a unified deep learning architecture that leverages transformer attention mechanisms to exploit both spectral and auxiliary information causing spectral variations. The framework models the temporal and spatial evolution of abundances while simultaneously learning representative endmember spectra. By integrating cross-attention between spectral inputs, auxiliary variables, and temporal embeddings, the model can adapt to seasonal changes, illumination conditions, and scene-specific variability. The method is trained using synthetic mixtures derived from Sentinel-2 surface reflectance data. Applied to monitoring small rivers with strong temporal, and spatial, and intrinsic variability, the proposed approach demonstrates improved accuracy in estimating water abundances and extracting water spectra in highly mixed river pixels (mixed with water and riverbank). The model effectively captures tempo-spatial transitions in water extent and sediment-laden river inflows, offering a more consistent representation than conventional unmixing techniques. This work contributes a generalizable and end-to-end framework for handling dynamic unmixing scenarios in multispectral remote sensing. It provides new insights into the use of transformers for modeling spatiotemporal interactions and supports applications in environmental monitoring and water resource assessment. 4:45pm - 5:00pm
Land Cover Classification of Optical–SAR Imagery via Cross-Modal Interaction and Feature Alignment Faculty of Geosciences and Engineering, Southwest Jiaotong University, Chengdu, 611756, China Land cover classification (LCC) plays a crucial role in geoscientific research and resource monitoring applications. Compared with traditional single-modal classification methods, multimodal fusion models can more effectively leverage the complementary information of optical and synthetic aperture radar (SAR) imagery, thereby improving classification performance in complex scen- arios. However, due to the significant differences in the imaging mechanisms of the two sensors, inconsistencies in radiometric properties and spatial structures arise between optical and SAR images, posing challenges for cross-modal feature interaction and fusion. To address this issue, we propose a multimodal optical–SAR fusion network (MOSFNet) for high-precision LCC, which incorporates two core modules: the Feature Interaction Module (FIM) and the Feature Fusion Module (FFM). The FIM achieves complementary feature interaction between optical and SAR images through channel splitting and cross concatenation, while in- corporating a coordinate attention mechanism to enhance the responsiveness of key land cover regions. The FFM leverages a 2D selective scan (SS2D) mechanism to implement bidirectional cross-modal feature alignment and gated fusion in the hidden state space, enabling deep correlation and adaptive integration of optical and SAR features. Experiments on the WHU-OPT-SAR dataset demonstrate that MOSFNet significantly outperforms existing methods in terms of classification accuracy and model generalization, providing an efficient and robust solution for high-precision land cover mapping with multi-source remote sensing imagery. 5:00pm - 5:15pm
Seasonal-Aware Scale-Semantic Consistency Alignment Change Detection Network 1Chinese Academy of Surveying and Mapping Beijing, China; 2Liaoning Technical University Geomatics and Geographical Sciences, Fuxin, China; 3Joint Laboratory of Spatial Intelligent Perception and Large Model Application, Nanjing, China Change detection in remote sensing imagery is a crucial method for obtaining dynamic information about land cover. However, pseudo-changes caused by seasonal variations pose a significant challenge to detection accuracy. Seasonal variations, such as vegetation phenology and snow cover, introduce global appearance differences that are often mistaken for actual land cover changes. This phenomenon is particularly prominent in long-term monitoring tasks, where pseudo-changes dominate the detection results. Addressing the issues of global appearance differences and multi-scale feature fusion induced by seasonal changes, We propose a novel Seasonal-Aware Scale-Semantic Consistency Alignment Change Detection Network (SSCANet) for remote sensing image change detection. This approach incorporates a Seasonal-Aware Scale Alignment (ASA) module and a Seasonal-Aware Semantic Guided Fusion (SGF) module. By employing spatial scale transformation and semantic alignment, it reduces information mismatch in multi-scale feature fusion and enhances the perception of details in change regions. Experiments conducted on the GZ-CD and CDD datasets demonstrate that SSCANet achieves overall accuracy with F1 scores of 89.21% and 97.82%, with precision rates of 89.02% and 98.37%, respectively. These results represent significant improvements over other methods, demonstrating that SSCANet outperforms its counterparts in both overall accuracy and seasonal robustness. The findings confirm that this approach effectively suppresses seasonal false changes, enhancing the accuracy and reliability of change detection. | ||

