JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at isprs2026@icsevents.com.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Agenda Overview

Session

WG III/1F: Remote Sensing Data Processing and Understanding

Time:

Friday, 10-July-2026:

1:30pm - 3:00pm

Location: 713A

125 theatre

Session Topics:

Remote Sensing Data Processing and Understanding (WG III/1)

External Resource: http://www.commission3.isprs.org/wg1

Presentations

1:30pm - 1:45pm

From Image to Perception: Scene-Graph-Driven Modeling of Human-Scale Urban Experience with Street-view Images

Haipeng Yang, Xian Guo

Beijing University of Civil Engineering and Architecture, China, People's Republic of

This study examines how street-view scenes relate to urban perception using a scene-graph-driven modeling method. Each image is parsed into subject–predicate–object triplets; entity appearance from a CNN backbone and relation semantics from a Transformer detector are fused at node level via a learnable gate. A relation-aware graph neural network performs message passing and attentive readout to predict six perception dimensions (beautiful, boring, depressing, lively, safe, wealthy). Taking Place Pulse 2.0 dataset as benchmark, we convert pairwise votes to binary labels per dimension with standard train/validation/test splits. Experiments compare the graph approach against CNN+SVM and Transformer+SVM baselines under identical protocols. Results show consistently higher accuracy across all six dimensions, with notable gains for beautiful and wealthy. Gradient and integrated-gradient analyses offer node- and edge-level attributions, highlighting elements such as trees, facades, and overhead wires. The method balances accuracy with clarity, and the results point to practical cues that can support human-centered urban design.

1:45pm - 2:00pm

Real-Time Road Condition Detection and Mapping Using YOLOv11 and Built-In Car Dashcam

Harjot Josan¹, Frank Zhang², Baoxin Hu³

¹University of the Fraser Valley (UFV), Canada; ²University of the Fraser Valley (UFV), Canada; ³Dept. of Earth and Space Science and Engineering, York University, Toronto, Canada

Road surface conditions decline due to heavy traffic volumes, severe weather, and recurring utility works, yet still, many road agencies still rely on manual windshield surveys and semi-automated inspections. Not only are these methods time-consuming, but also difficult to scale and labour-intensive. With the help of recent advances in deep learning and the widespread availability of built-in vehicle dashcams, they offer new opportunities for low-cost, automated pavement assessments. This contribution presents a mobile, dashcam-based framework for detecting road-surface defects using the latest YOLOv11, which is combined with geolocation tagging for spatial visualization.

To test out our YOLOv11 training model, we conducted the initial dataset at the University of the Fraser Valley campus and manually annotated it to identify crack fillings, crosswalk markings, speed bumps, lane markings, and other surface conditions. This was just a prototype, which would later be trained to detect all road conditions, such as gravel, potholes, and uneven roads, as well. To address variations in lighting and motion, augmentation techniques were applied. YOLOv11 acquired a mean average precision above 90% across all tested categories.

This prototype demonstrates a practical, low-cost approach for real-time pavement monitoring. Future work includes expanding data collection, developing an operational dashboard for road authorities, having exact GPS coordinates pinned on maps with damaged road images, and evaluating model performance across different data sources, including models trained through Google Images. By producing actionable geospatial information, this system supports more efficient maintenance workflows and offers a scalable pathway for municipalities seeking to modernize road-condition assessment.

2:00pm - 2:15pm

Towards Global Interpretability: Evaluating XAI Metrics in Building Footprint Extraction

Elif Ozlem Yilmaz, Taskin Kavzoglu

Gebze Technical University, Turkiye

Global population is projected to increase by about 70% by 2050, with a growing proportion of people living in urban areas. This trend highlights the importance of accurately assessing urban expansion. Automatic building detection from remotely sensed imagery using deep learning (DL) has demonstrated considerable potential for applications, including sustainable urban planning and infrastructure monitoring. However, the inherent black-box nature of DL models limits their transparency and reduces trust in model-driven decisions. Although various Explainable Artificial Intelligence (XAI) approaches have been proposed to highlight image regions influencing model predictions, qualitative visual inspection alone is insufficient for reliably evaluating the credibility of these explanations. This study evaluates several XAI techniques for building footprint extraction using a U-Net model trained on a refined Massachusetts Buildings Dataset. The segmentation model achieved precision, recall, F1-score, IoU, and overall accuracy values of 89.68%, 85.69%, 87.53%, 79.03%, and 94.35%, respectively. To investigate the model’s decision-making process, three explanation methods, namely Saliency, GradientSHAP, and GuidedGradCAM, were applied. The quality of the generated explanations was then quantitatively assessed using 16 evaluation metrics. Beyond single-image analysis, a dataset-level evaluation was conducted using 547 image patches containing building coverage greater than 20%. The results indicate that GuidedGradCAM produces more consistent and reliable explanations. Furthermore, dataset-level analysis using dense-building samples provides a statistically more robust representation of overall model behaviour compared to evaluations based on individual images. These findings highlight the importance of quantitative assessment in validating the interpretability of DL models for building footprint extraction.

2:15pm - 2:30pm

MaskRoof: A deep Learning Framework and Benchmark Dataset for fine-grained urban Rooftop Utilization and potential Analysis

Jinfeng Xie¹, Haojie Yang², Lingshuang Dong³, Anthony Yeh¹, Yi Zhang²

¹The University of Hong Kong, Hong Kong S.A.R. (China); ²Institute of Future Human Habitats, Tsinghua Shenzhen International Graduate School; ³Huawei Technologies Co., Ltd., Dongguan, Guangdong Province, China

Urban rooftops represent a critical vertical resource for sustainable development, yet comprehensive assessment of their utilization patterns and available capacity remains constrained by inadequate datasets and limited algorithmic capabilities. This study introduces the Urban Rooftop Utilization Dataset (URUD), the first multi-city, pixel-level semantic segmentation dataset encompassing 1,560 high-resolution satellite images from four Chinese cities. URUD establishes eight semantic categories including a novel "available area" class to address ambiguous regions that existing classification schemes fail to capture. The study further proposes MaskRoof, a transformer-based deep learning framework specifically designed for fine-grained rooftop analysis. The model integrates two task-specific modules, Hierarchical Zoom-in Attention (HZA) and Prior-Guided Cross-Attention (PGCA), to address challenges of small-scale target detection and class imbalance. Experimental results demonstrate that MaskRoof achieves superior performance with 94.46% accuracy and 47.29% mIoU, outperforming existing segmentation architectures. Application to Shanghai's outer ring area reveals that 60.74% of rooftop space remains available for utilization, with significant spatial heterogeneity across building types. Industrial and warehouse structures retain substantially greater unutilized areas compared to office and residential buildings. These findings provide quantitative evidence for differentiated urban planning strategies and demonstrate the framework's capability for large-scale rooftop potential assessment in complex urban environments.

2:30pm - 2:45pm

A comparison of CNN, Transformer, and open-vocabulary architectures for real-time photovoltaic defect detection using UAV thermal imagery.

Aissam Salah¹, Mouad Jabrane², Imane Sebari¹

¹Department of Photogrammetry and Cartography, School of Geomatics and Surveying Engineering, IAV Hassan II, Rabat, Morocco; ²Research Unit of Geospatial Technologies for a Smart Decision, IAV Hassan II, Rabat 10101, Morocco

Real-time defect detection in solar farms is critical for profitability and safety. This paper compares state-of-the-art (SOTA) object detection architectures for deployment on edge computing platforms for the purpose of thermal PV defect detection using UAV imagery. We systematically evaluated Closed-Set (YOLOv10, YOLOv12, RT-DETR, RF-DETR) and Open-Vocabulary (YOLO-World, OWL-ViT) models on a public thermal dataset. Our results highlight two leading candidates. The transformer-based RF-DETR sets a new accuracy record at 82.6% mAP@0.50, driven by its self-supervised backbone, but its inference speed is low (12.6 FPS). Conversely, the CNN-based YOLO-World integrates language semantics to reach a competitive 78.1% mAP@0.50 while operating at a real-time speed of 31.3 FPS. We conclude that both RF-DETR and YOLO-World are promising for embedded thermal fault detection. The final selection will depend on on-platform inference performance.