Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Agenda Overview |
| Session | ||
WG III/1M: Remote Sensing Data Processing and Understanding
Session Topics: Remote Sensing Data Processing and Understanding (WG III/1)
| ||
| External Resource: http://www.commission3.isprs.org/wg1 | ||
| Presentations | ||
8:30am - 8:45am
Evaluating super-resolution models for real-world Sentinel-2 applications: A case study 1German Aerospace Center (DLR), The Remote Sensing Technology Institute, Germany; 2Technical University of Munich. School of Computation, Information and Technology High-resolution Earth observation data are crucial for applications such as agriculture, urban planning, and environmental monitoring. Although commercial satellites provide sub-meter imagery, open-access alternatives like Sentinel-2 are limited to resolutions around 10~m ground sampling distance, which is insufficient for many tasks. In this work, we investigate image super-resolution as a method to bridge this gap, enhancing downstream performance on freely available satellite data. We leverage two 16-bit single-band datasets, consisting of Sentinel-2 (20m --> 10m) and Venus (10m --> 5m) images, to train and benchmark state-of-the-art SR methods, including transformer- and diffusion-based approaches, across multiple dataset mixes. These models are evaluated quantitatively using reference-based metrics (PSNR, SSIM) using ground-truth and no-reference scores (FID, NIQE) for native upscaling from 20m --> 10m and 10m --> 5m. We observe that different SR architectures present trade-offs between standard quantitative metrics and perceptual image quality. We further assess their impact on a practical downstream task: field boundary detection from Sentinel-2 imagery. Our experiments demonstrate that SR pre-processing improves quantitative fidelity and downstream task performance, enabling low-resolution satellites to compete more effectively with commercial imagery 8:45am - 9:00am
Fine-Grained Remote Sensing Imagery Generation Driven by Expert Knowledge and Hierarchical Captions 1Moganshan Geospatial Information Laboratory, Huzhou, China; 2Key Laboratory of Monitoring, Evaluation and Early Warning of Territorial Spatial Planning Implementation, Ministry of Natural Resources, Chongqing, China; 3School of Earth Sciences, Zhejiang University, Hangzhou, China; 4National Geomatics Center of China, Beijing, China; 5School of Geosciences and Info-Physics, Central South University, Changsha, China; 6School of Environment and Spatial Informatics, China University of Mining and Technology, Xuzhou, China Current diffusion models struggle to achieve fine-grained remote sensing imagery (RSI) generation. This limitation fundamentally stems from their reliance on "flattened" text prompts, which overlook the inherent hierarchical structure of RSI. This paper proposes a fine-grained RSI generation method driven by expert knowledge and hierarchical captions. We first deconstruct RSI into a hierarchical "element-relation-scene" caption and employ an automatic caption optimization mechanism, grounded in spatial relation knowledge, to ensure high fidelity. Critically, we introduce a novel hierarchical caption encoding mechanism that efficiently injects decoupled hierarchical caption segments into the U-Net's cross-attention layers. This design enables the model to exert hierarchical and decoupled attentional control over the global scene, spatial layout, and geographical element details during the denoising process. Experiments demonstrate that, when combined with efficient fine-tuning algorithms such as LoRA, our method significantly outperforms traditional single-level captions across all six evaluation metrics, exemplified by the FID metric decreasing from 228.43 to 205.59 and the GSHPS metric increasing from 0.86 to 0.92. This research provides a new paradigm for controllable remote sensing scene generation, establishing an effective link between hierarchical semantic understanding and the progressive generation process of diffusion models. 9:00am - 9:15am
Image-level and Feature-level Semantic-aware Architecture for Cross Domain Semantic Segmentation of High-resolution Remote Sensing Imagery 1State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, People's Republic of China; 2School of Remote Sensing and Information Engineering, Wuhan University, People's Republic of China Semantic segmentation of remote sensing images has attracted considerable attentions. For cross domain semantic segmentation, the images captured at different times inevitably exhibit significant domain and feature gaps. Besides, the labels are precious, given that acquiring adequate annotations is time-consuming and laborious. There are numerous methods to cope with these problems, for example, semi-supervised, weakly supervised learning help for the lack of label, while style transfer and domain adaptation are effective for domain gaps. However, the outcomes are still not ideal. Nearly all methods ignore the combination of image-level alignment and feature-level alignment, while few methods consider class-wise constraint to boost the performance. Towards this end, IFSDA, an image-level and feature-level semantic-aware architecture for cross domain semantic segmentation is put forward. In order to acquire sound outcomes, two branches of alignment strategies are realized by self-supervised learning and generative adversarial learning. Besides, a novel semantic discriminator is utilized in image translation process to optimize class-related information, thereby helping to eliminate the intra-class domain gaps between bi-temporal images and optimize the segmentation results effectively. Experiments on ISPRS 2D Semantic Labeling Contest Dataset have shown the superiority of proposed method over other models. 9:15am - 9:30am
Automating Expansive Cliff-nesting Seabird Colony Counts with Deep Learning: A Case Study of Aerial Photo Surveys of Northern Fulmars in Arctic Canada 1National Research Council Canada; 2Environment and Climate Change Canada; 3Acadia University, Canada Reliable estimates of seabird colony size are essential for monitoring population dynamics, yet accurate counts are difficult for expansive colonies on remote Arctic cliffs. Northern fulmars (Fulmarus glacialis) breed in large, unevenly distributed aggregations across extensive, towering cliffs in the Canadian Arctic, posing numerous survey challenges. Side-looking helicopter photo surveys generate thousands of photos where birds are small, variably angled, and of inconsistent sharpness against large, complex backgrounds. We used deep learning to automate fulmar counts in this imagery. Our objectives were to (1) develop an object detection model trained on manually annotated imagery sampled from three Arctic colonies, (2) evaluate model performance, and (3) estimate total size of an entire colony. We trained a YOLOX-based model on >16,000 annotated birds, following a two-stage training approach for small objects interspersed across expansive and heterogenous backgrounds. Compared to ~20,000 additional manual annotations in a sample of the Cape Liddon colony in the territory of Nunavut, the model detected 90% of birds with a 9% false-positive rate (i.e. 90% recall, 91% precision). The model's detection sensitivity was calibrated to achieve a ~1:1 ratio between total model detections (true positives + false positives) and the 'true' count, which required manual annotation of ~15% of the colony imagery. Overall, the model detected 38,723 fulmars across the entire colony, providing a robust estimate of its full population. These results highlight deep learning’s potential to greatly streamline and scale up seabird monitoring in remote polar environments where conventional surveys are constrained. 9:30am - 9:45am
Estimation of surface nitrogen dioxide (NO₂) using TEMPO satellite data and machine learning York University, Canada Air pollutants such as nitrogen dioxide (NO₂) have detrimental effects on human health and ecosystems. It is therefore very crucial to pinpoint the location of high pollutant concentrations over large areas. Ground-based stations, while offering continuous temporal measurements, cannot provide broader spatial coverage for regions like cities. This study uses Tropospheric Emissions: Monitoring Pollution (TEMPO) satellite observations and a machine learning model to estimate high-resolution surface-level NO₂ concentrations over the Greater Toronto Area (GTA), Ontario, Canada. The random forest regression model was trained with input parameters such as hourly tropospheric NO₂ vertical column density (VCD) values and boundary layer height (BLH), which are the two most effective parameters in feature importance. The model achieved a coefficient of determination (R²) of 0.84, a root mean square error (RMSE) of 1.703 µg/m³, and a mean absolute error (MAE) of 0.939 µg/m³, indicating strong and reliable predictive performance. The findings of this research can support air quality forecasting, public health studies, and urban planning decisions, especially in regions with scarce ground-based pollutant data. 9:45am - 10:00am
Learning from Maps to Update Them: A Deep Learning-Based Approach Using Multimodal Airborne Data University of Twente, The Netherlands Automatic updating of topographic maps remains a significant challenge, as current workflows still rely heavily on manual interpretation of airborne data. This study proposes a method for identifying topographic changes by learning object representations from existing maps and using them as reference data for change detection. Map-derived labels are used to train independent 2D and 3D segmentation networks that generate semantic predictions from orthoimages and point clouds. Unlike conventional change-detection approaches that require temporally aligned datasets of the same modality, the proposed method directly compares newly acquired airborne data with existing map vectors. Semantic predictions from both modalities are vectorized and selectively fused into polygon geometries, which are subsequently compared with reference map vectors to identify object-level "from–to" changes. The workflow highlights potential change regions and their predicted semantic classes, allowing operators to focus inspection on relevant areas rather than the entire dataset. Detected changes include both real-world developments, such as new construction and demolitions, and inconsistencies in the reference map caused by outdated or inaccurate delineations. To assess the effect of multimodal integration, the workflow is compared with a 2D-only baseline. The results indicate that integrating 3D geometric information can reduce noisy detections and improve the spatial consistency of candidate change objects, particularly for water and bridge classes. | ||

