JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at isprs2026@icsevents.com.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Agenda Overview

Location: 715B
125 theatre

Date: Saturday, 04-July-2026

8:30am - 5:00pm

TuT6: Open Point-to-point Correspondences for Loose or Tight Integration in Kinematic Laser Scanning
Location: 715B

Date: Sunday, 05-July-2026

8:30am - 12:00pm

TuT15: Getting Started with CNES Open-Source 3D Tools in Python
Location: 715B

12:00pm - 1:15pm

WG IV/8A: Digital Twins for Mobility and Navigation
Location: 715B

12:00pm - 12:15pm

Vision-Language Models for Urban Digital Twins

Amirhossein Nourbakhshrezaei, Saeed Abbasi, Mojgan Jadidi

Civil Engineering Department, Lassonde School of Engineering, York University, Canada

Urban digital twins are virtual city replicas that can greatly support urban planning by simulating infrastructure and mobility scenarios. However, keeping a digital twin up-to-date with fine-grained, real-world urban conditions is challenging. This paper proposes a novel system that leverages multi-modal AI models to bridge the gap between physical urban data collection and a 3D city digital twin. In our approach, ordinary smartphones carried in vehicles act as mobile sensors, continuously capturing multi-modal data (road images, GPS coordinates, and speed). Advanced vision-language models then analyze the data to automatically extract information from the traffic infrastructure and detect road anomalies. The extracted information such as the locations of traffic signs, traffic signals, road surface cracks, and potential blind spots at intersections is geo-tagged and streamed into language-vision models to interpret data and stream human readable insights into the digital twin model. The case study is the digital twin of the city of Toronto. By aggregating data from many drivers and analyzing it (in post-processing for high accuracy), the digital twin evolves into a living model of the urban environment. This enriched and dynamic twin provides urban planners with up-to-date insights on traffic signage, road conditions, and other relevant road infrastructure elements, enabling proactive maintenance and informed decision-making for city planning.

12:15pm - 12:30pm

GeoSceneGraph: Geometric Scene Graph Diffusion Model for Text-guided 3D Indoor Scene Synthesis

Antonio Ruiz^1,2, Tao Wu¹, Andrew Melnik², Qing Cheng¹, Xuqin Wang¹, Lu Liu¹, Yongliang Wang¹, Yanfeng Zhang¹, Helge Ritter²

¹Huawei Technologies; ²Center for Cognitive Interaction Technology (CITEC), Bielefeld University

Methods that synthesize indoor 3D scenes from text prompts have wide-ranging applications in film production, interior design, video games, virtual reality, and synthetic data generation for training embodied agents. Existing approaches typically either train generative models from scratch or leverage vision-language models (VLMs). While VLMs achieve strong performance, particularly for complex or open-ended prompts, smaller task-specific models remain necessary for deployment on resource-constrained devices such as extended reality (XR) glasses or mobile phones. However, many generative approaches that train from scratch overlook the inherent graph structure of indoor scenes, which can limit scene coherence and realism. Conversely, methods that incorporate scene graphs either demand a user-provided semantic graph, which is generally inconvenient and restrictive, or rely on ground-truth relationship annotations, limiting their capacity to capture more varied object interactions. To address these challenges, we introduce GeoSceneGraph, a method that synthesizes 3D scenes from text prompts by leveraging the graph structure and geometric symmetries of 3D scenes, without relying on predefined relationship classes. Despite not using ground-truth relationships, GeoSceneGraph achieves performance comparable to methods that do. Our model is built on Equivariant Graph Neural Networks (EGNNs), but existing EGNN approaches are typically limited to low-dimensional conditioning and are not designed to handle complex modalities such as text. We propose a simple and effective strategy for conditioning EGNNs on text features, and we validate our design through ablation studies.

12:30pm - 12:45pm

A Multi-Dimensional Digital Twin Framework for the Low-Altitude Economy

Yan Li, Chenming Ye, Wenxuan Shi, Wenqing Zhang, Yuyang Zhang, Teng Hu, Zhizhong Kang

China University of Geosciences (Beijing), China, People's Republic of

The Low-Altitude Economy (LAE), driven by the widespread deployment of UAVs and eVTOL aircraft, demands a high-fidelity Digital Twin that extends far beyond static geographic representation. This study presents a critical review of 39 peer-reviewed papers to propose a three-layer mapping framework — Geospatial Infrastructure Layer, Environmental Sensing Layer, and Interaction Layer — and evaluates the Technology Readiness Level (TRL) of each sub-domain. The Geospatial Infrastructure Layer encompasses terrain models, ground facilities, airspace structures, and semantic navigation landmarks. The Environmental Sensing Layer covers electromagnetic modeling, target sensing and countermeasure, and micro-meteorological mapping. The Interaction Layer addresses network trust, data security, swarm coordination, and platform reliability. Our TRL assessment reveals that Environmental Sensing is the most mature layer (mean TRL 4.4, 8 field-validated papers), while cross-layer integration remains the weakest link (mean TRL 3.5, zero field-validated demonstrations). We identify standardization of low-altitude spatial data products, AI-enabled predictive mapping, crowdsourced Digital Twin updating, and closed-loop cross-layer integration as the four priority research directions.

12:45pm - 1:00pm

Lightweight indoor Pedestrian Localization via multi-step State-extended Fusion of Wi-Fi weighted Fingerprinting and PDR

Renjie Yuan, Zhiyong Wang, Kunlin Yu, Lang Hu, Yonghan Liao, Yilang Lin, Junjie Lu

South China University of Technology, China, People's Republic of

Indoor pedestrian localization on smartphones requires a lightweight yet robust fusion framework capable of handling unstable wireless signals and drift-prone inertial motion. In this work, we propose an efficient positioning system that integrates weighted Wi-Fi fingerprinting with Pedestrian Dead Reckoning (PDR) through a Multi-Step Extended Kalman Filter (EKF). Unlike conventional single-step filtering, the EKF employed here is formulated from the perspective of factor-graph–based probabilistic estimation, where Wi-Fi observations and PDR increments naturally act as complementary measurement and motion factors. Building on this unified view, the proposed Multi-Step EKF retains several consecutive states within its estimation window, effectively approximating short-horizon smoothing while maintaining the computational footprint required for real-time execution on consumer smartphones.

To enhance observation stability, an inverse-distance weighted fingerprinting module mitigates RSSI fluctuations and gracefully handles missing values. Meanwhile, PDR inputs are refined through diagnostic analysis and incorporated as motion constraints within the fusion process. Global optimization of noise parameters is performed via dual annealing, further improving the reliability of state updates.

Experiments conducted on an open indoor dataset demonstrate that the proposed method achieves a reduction of approximately 31% in mean localization error compared with a standard single-step EKF baseline. The results confirm that enforcing short-term temporal consistency through a multi-step state representation significantly suppresses drift accumulation and enhances robustness under dynamic indoor environments. Overall, the proposed framework offers a theoretically grounded, computationally efficient, and practically deployable solution for smartphone-based indoor positioning.

1:30pm - 2:45pm

WG IV/8B: Digital Twins for Mobility and Navigation
Location: 715B

1:30pm - 1:45pm

Topological Analysis of OpenDRIVE Models for Advanced Autonomous Vehicle Simulations

Janos Mate Logo, Viktor Gyozo Horvath, Vivien Poto, Arpad Barsi

Budapest University of Technology and Economics, Department of Photogrammetry and Geoinformatics, Hungary

The increasing demand for safe and efficient autonomous vehicle (AV) operations has intensified the need for realistic, high-fidelity digital road representations that enable robust virtual testing environments. Simulation-based validation has become a cornerstone of the AV development process, allowing for the reproducible assessment of perception, localization, and decision-making modules under controlled conditions. Within this context, the ASAM OpenDRIVE specification provides a standardized, XML-based description of static road networks, encapsulating geometric, semantic, and structural elements such as roads, lanes, junctions, and roadside objects. While previous research has primarily focused on the geometric accuracy and semantic richness of High Definition (HD) maps, comprehensive topological analyses—especially those addressing consistency, connectivity, and completeness of OpenDRIVE models—remain largely unexplored. This study aims to fill that gap by introducing a formal topological framework for evaluating OpenDRIVE-based road models through both synthetic and real-world test cases.

1:45pm - 2:00pm

A Comprehensive Toolkit for Semi-Automated HD Maps Production: Integrating AI-Driven Feature Extraction with 3D Interactive Validation and Editing

YI-FENG CHANG, KAI-WEI CHIANG, MENG-LUN TSAI, PEI-LING LEE, SYUN TSAI, CHI-HSIN HUANG, TING-CHUN WU, YI-HAN JEN, JIUN-YO LIN, CHANG-LE LEE, CHING-HSIANG LIN, HAN-CHE HUANG

National Cheng Kung University, Chinese Taipei

This paper presents a comprehensive toolkit for semi-automated High-Definition Maps (HD Maps) production that integrates Artificial Intelligence (AI)-driven feature extraction with 3D human-in-the-loop validation. High-definition maps provide centimeter-level road geometry and traffic asset information, but large-scale production remains costly due to dense mobile mapping data and manual digitization. The proposed workflow consists of two self-developed components: a Semi-automated HD Maps Production Tool for batch extraction and a 3D HD Maps Validation and Editing Tool for structured review. The project-based pipeline ingests georeferenced mobile laser scanning point clouds, Inertial Navigation System / Global Navigation Satellite System (INS/GNSS) trajectories, and camera imagery, and applies configurable chains of ground filtering, road-marking extraction, voxel down-sampling, clustering, oriented bounding box analysis, and AI-based traffic asset detection. Candidate features with confidence indicators and basic attributes are stored in a project database and edited in a tightly coupled 3D environment that supports snapping, constrained adjustments, and semantic reclassification while logging all user edits. The toolkit is evaluated on a closed proving ground (CARLab, Shalun) and a freeway section of Taiwan National Highway No. 3. At CARLab, semi-automated extraction achieves F1-scores of 0.85–0.95 for key layers. For a one-kilometer highway section, operator time is reduced from 90–120 minutes in a purely manual Geographic Information System (GIS) workflow to about 45 minutes with the proposed approach, while maintaining comparable geometric accuracy. These results demonstrate a practical path towards scalable, traceable HD Maps production for autonomous driving applications.

2:00pm - 2:15pm

A Low-Altitude Data Space Framework Based on China̓s National 3D Mapping Program

Yin Gao^1,2, Jun Chen^1,2, Dehu Yang^1,3, Chaoquan Zhang¹, Fengyu Han¹

¹Moganshan Geospatial Information Laboratory, Huzhou, 313200, China; ²National Geomatics Center of China, Beijing 100830, China; ³Faculty of Geosciences and Engineering, Southwest Jiaotong University, Chengdu 610031, China

National Key Research and Development Program of China (2025YFB3910300);

2:15pm - 2:30pm

Geometrically accurate 3D Gaussian Reconstruction using high-density UAV LiDAR point clouds and open-vocabulary semantic optimization

Banghui Yang^1,2, Kai Qin^1,2, Yucheng Li^1,2, Jing Li^1,3

¹Aerospace Information Research Institute，Chinese Academy of Science, China, People's Republic of; ²University of Chinese Academy of Sciences,Beijing; ³International Research Center of Big Data for Sustainable Development Goals, China

3D scene reconstruction lies at the core of computer vision, photogrammetry, and geospatial science, spatial intelligence, aiming for accurate, photorealistic, and efficient digital twin representations of the real world. The emergence of revolutionary 3D Gaussian Splatting (3DGS) enables real-time rendering and geometrically precise reconstruction, yet existing methods struggle in large-scale outdoor scenes with weak textures, low geometric accuracy, dynamic objects, and lack of semantic information. Therefore, geometrically accurate 3D GS with enhanced semantic understanding greatly facilities the realization of digital twins for mobility and navigation. This work proposes a novel 3DGS framework which seamlessly incorporates dense UAV LiDAR point clouds, multi-view images and open-set semantics in an all-in-one optimization process. The key objective here is to investigate how geometric constraints derived from dense UAV LiDAR point clouds and cognitive supervision from SAM (Segment Anything Model) semantics can jointly participate in the optimization of Gaussian primitives, thereby improving geometry accuracy, visual realism, and semantic consistency in large-scale UAV 3D reconstructions for creating digital twins of the environments.

Date: Monday, 06-July-2026

8:30am - 10:00am

WG II/3A: 3D Scene Reconstruction for Modeling & Mapping
Location: 715B

8:30am - 8:45am

GT-LOD3: LOD3 Semantic 3D Building Reconstruction Benchmark Dataset

Han Sae Kim¹, Olaf Wysocki², Ludwig Hoegner³, Ksenia Bittner⁴, Joshua Carpenter⁵, Friedrich Fraundorfer^4,6, Arpan Kusari⁷, Max Mehltretter⁸, Franz Rottensteiner⁸, Anna Schadl⁹, Martin Weinmann¹⁰, Jinha Jung¹

¹Lyles School of Civil and Construction Engineering, Purdue University, West Lafayette, IN, USA; ²CV4DT, Department of Engineering, University of Cambridge, Cambridge, United Kingdom; ³Faculty of Civil Engineering, Hochschule München University of Applied Sciences, Munich, Germany; ⁴Remote Sensing Technology Institute, German Aerospace Center (DLR), Oberpfaffenhofen, Germany; ⁵Department of Civil Engineering, The University of Akron, Akron, OH, USA; ⁶Institute of Visual Computing, Graz University of Technology, Graz, Austria; ⁷University of Michigan Transportation Research Institute, University of Michigan, Ann Arbor, MI, USA; ⁸Institute of Photogrammetry and GeoInformation, Leibniz University Hannover, Hannover, Germany; ⁹Faculty of Geoinformatics, Hochschule München University of Applied Sciences, Munich, Germany; ¹⁰Institute of Photogrammetry and Remote Sensing, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

This contribution introduces GT-LOD3, a new benchmark dataset designed to advance semantic Level of Detail 3 (LOD3) building reconstruction from UAS-based photogrammetric point clouds. Existing benchmarks primarily focus on mesh- or point-level semantic labelling, façade segmentation, or LOD2-level modelling, but high-quality, geometry-accurate LOD3 ground truth paired with real-world photogrammetric observations are still limited. GT-LOD3 fills this gap by offering paired UAS point clouds and manually modeled LOD3 reference data in CityGML format, enabling research on window-level facade reasoning, geometric regularization, and instance-level shape recovery.

The benchmark currently consists of two subsets featuring different architectural styles and environmental conditions: (1) a urban block in Gold Coast (Lakewood, Ohio, USA), and (2) the Technical University of Munich (TUM) campus. The accompanying LOD3 reference models contain explicit window geometry, enabling detailed evaluation of both detection performance and polygon-level geometric accuracy.

We further provide a baseline reconstruction pipeline that combines point-cloud semantic segmentation, facade-aligned 2D projection, window region extraction, and geometric back-projection into CityGML. An evaluation protocol is presented including pixel-level metrics (IoU, precision, recall, F1) and instance-level detection metrics based on optimal assignment via the Hungarian algorithm.

8:45am - 9:00am

LoD2-Former: Multi-Modal Transformer-Based 3D Building Wireframe Reconstruction

Youssef Abdelhedi¹, Daniel Panangian¹, Chaikal Amrullah¹, Houda Chaabouni-Chouayakh², Ksenia Bittner¹

¹Remote Sensing Technology Institute, German Aerospace Center (DLR), Wessling, Germany; ²Sm@rts Laboratory, Digital Research Center of Sfax, Tunisia

Building wireframe reconstruction from LiDAR faces challenges due to sparse and incomplete point cloud data. We present LoD2-Former, a multi-modal Transformer architecture that fuses aerial LiDAR and optical imagery for end-to-end 3D roof wireframe reconstruction. Unlike existing point-cloud-only methods, our dual-backbone approach with bidirectional cross-modal attention leverages complementary geometric and visual information. Experiments on two datasets show consistent improvements in edge detection metrics, with edge F1-scores increasing from 0.874 to 0.899 on Tallinn and 0.968 to 0.974 on Roof-Intuitive, while substantially boosting corner recall (0.630 to 0.729) in complete-data settings. We also contribute a curated multi-modal subset of Building3D with aligned LiDAR and aerial imagery to facilitate future research.

9:00am - 9:15am

Point2WSS: Reconstructing LoD2 Buildings from Aerial LiDAR Data using Multimodal Learning and Weighted Straight Skeleton

Pierre-Loïc Queffélec^1,2, Nicolas Trouvé¹, Stéphane Roussel¹, Teng Wu², Bruno Vallet²

¹DEMR, ONERA, Université Paris Saclay, F-91123 Palaiseau, France; ²Univ Gustave Eiffel, ENSG, IGN, LASTIG, F-77420 Champs-sur-Marne, France

In this paper, a method exploiting aerial LiDAR point clouds to build realistic building meshes suitable for electromagnetic simulation is proposed. One of the main challenges lies in reconstructing regularized building meshes with low polygonal density. Optimization-based methods, commonly used for building reconstruction from point clouds, are highly data-driven, making the quality of results dependent on the quality of input data. Aerial LiDAR scans can be incomplete or sparse, for instance due to occlusion. A novel LoD2 buildings reconstruction method based on deep learning is proposed, assuming that deep learning methods are more robust to incomplete or sparse data than optimization-based methods. A parametric building model is introduced, based on the Weighted Straight Skeleton algorithm, which generates realistic roofs from a building footprint and an associated set of slopes, and subsequently extrudes the roof to the specified building height. This parametric approach guarantees that a given set of parameters (height, footprint and slopes) produces a regularized building mesh with low polygonal density. A multimodal model, named Point2WSS, was trained to recover the variable number of building's continuous parameters from its corresponding point cloud. This approach enables the generation of realistic building meshes suitable for electromagnetic simulation, if the predicted parameters accurately approximate real-world values.

9:15am - 9:30am

Wide-area Scene Reconstruction with polyhedral Buildings featuring recognized Regularities

Jochen Meidow

Fraunhofer IOSB, Germany

The modeling of buildings suffers from a dichotomy between generic and specific representations: the lack of domain knowledge in flexible models that can represent many shapes, and the restricted geometry of pre-specified parametric building primitives. To fill this gap, we propose using general boundary representations enriched with automatically recognized and enforced geometric constraints derived from human-made regularities. The proposed reasoning process relies on the statistics of the planar point groups extracted from airborne-captured point clouds. Hence, a chosen significance level is the only process parameter. To enforce the creation of sound solids, we apply manifold constraints for the generation of the boundary representations. The feasibility and usability of the approach are demonstrated by evaluating an airborne-captured laser scan containing approximately 7,600 buildings over an area of 50 km^2 featuring both inner-city and rural landscapes.

9:30am - 9:45am

The P3 Dataset: Pixels, Points and Polygons for Multimodal Building Vectorization

Raphael Sulzer^1,2, Liuyun Duan², Nicolas Girard², Florent Lafarge¹

¹Université Côte d’Azur, INRIA – Sophia-Antipolis, France; ²LuxCarta Technology, Mouans-Sartoux, France

We present P3, a large-scale multimodal dataset for building vectorization, including aerial LiDAR point clouds, aerial images, and vectorized 2D building outlines, collected across three continents. P3 contains over 10 billion LiDAR points with decimeter-level accuracy and RGB images at a ground sampling distance of 25 centimeters. While many existing datasets focus on the image modality, P3 offers a complementary perspective by incorporating dense 3D information. We demonstrate that LiDAR point clouds serve as a robust modality for predicting building polygons, both in hybrid and end-to-end learning frameworks. Moreover, fusing LiDAR and imagery further improves accuracy and geometric quality of predicted polygons. The P3 dataset is publicly available, along with code and pretrained weights of three state-of-the-art models for building polygon prediction at https://github.com/raphaelsulzer/PixelsPointsPolygons .

9:45am - 10:00am

Building height estimation from stereo satellite images using contour vector registration

Yaxuan Duan, Wei Qin, Xin Huang, Pei Mi, Yang Yu, Pengjie Tao

School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China

Accurate building height estimation plays a crucial role in large-scale 3D urban reconstruction. However, conventional stereo matching approaches often suffer from mismatches around building edges, leading to unreliable height retrieval in dense urban areas. To address this issue, this paper presents a novel method for building height estimation based on contour vector registration integrated with the vertical line locus technique. The proposed framework first automatically matches building contour vectors extracted from stereo high-resolution satellite images. Then, for each paired contour, a range of candidate heights is searched using a rational function model to project the reference contour from the image space to object space and then reproject it onto the conjugate image. The elevation that maximizes the overlap ratio between projected and paired contours is identified as the optimal roof elevation. Building height is subsequently derived by subtracting the ground elevation from the estimated roof elevation. Experiments conducted on SuperView-1 (SV-1) satellite stereo images over Jiuyuan District, Baotou, Inner Mongolia, China, demonstrate the effectiveness of the proposed method. The resulting building height estimates achieve a root mean square error of 0.84 m compared to manual measurements, showing strong agreement (r = 0.9993). The proposed contour-based stereo registration approach provides a robust and efficient solution for building height extraction from high-resolution satellite data, supporting precise urban 3D modeling and large-scale spatial analysis.

1:30pm - 3:00pm

WG IV/1B: Spatial Data Representation and Interoperability
Location: 715B

1:30pm - 1:45pm

Zonology: An Ontology-Based Framework for Harmonizing Zoning Semantics Across Multi-Jurisdictional Greater Toronto Area (GTA) Planning Systems

Arian Mesbahian¹, Amin Sarang², Arash Shahi³, Mojgan Jadidi¹

¹Department of Civil Engineering, Lassonde School of Engineering, York University, Canada; ²DevNext Inc., Canada; ³AECO Innovation Lab Inc., Canada

Urban development in the Greater Toronto Area faces significant challenges because zoning abbreviations and terminology vary widely between municipalities. This provides the background and motivation for the study, as labels such as “R2” in Toronto and “R2 S” in Markham appear similar yet represent different permissions and development standards, creating confusion and slowing planning workflows in a region with growing housing pressures. The problem addressed in this research is the absence of a unified, machine-readable framework that standardizes zoning terminology across municipalities, which limits automated compliance checking, GIS integration, and cross municipal comparison. The objective of this work is to create Zonology, an ontology-based framework that harmonizes zoning abbreviations, permitted land uses, and development standards, beginning with the City of Toronto and the City of Markham. The methodology follows the Linked Open Terms approach, using the Web Ontology Language to encode zoning by laws, land use categories, development standards, and spatial relationships. The model is evaluated through reasoning tasks, competency questions, and semantic alignment tests to ensure clarity, consistency, and interoperability. The results show that Zonology successfully aligns more than sixty zoning categories and over one hundred fifty land use permissions, enabling consistent semantic interpretation and cross municipal queries. The overall significance of this work is that the ontology improves regulatory clarity, strengthens data driven planning, and provides a scalable foundation for harmonized zoning governance across the Greater Toronto Area.

1:45pm - 2:00pm

GeoGraphJSON: A lightweight semantic data model integrating spatial geometry and graph connectivity for AI-driven spatial reasoning

Muhamad Alrajhi¹, Christian Heipke², Mohammed Afroz Khan¹

¹RASIKH Institute for Education and Training, Riyadh; ²Leibniz Universität Hannover

Urban systems are increasingly complex, interconnected, and dynamic, yet most geospatial data models continue to represent them as static geometric layers with limited support for explicit relationships and semantics. This restricts advanced spatial reasoning, network analysis, and AI-driven applications.

This paper introduces GeoGraphJSON, a lightweight semantic data model that extends GeoJSON by integrating spatial geometry with graph-based connectivity. The framework represents spatial entities as nodes and explicitly encodes relationships as typed edges, enabling unified representation of geometry, topology, and semantics within a single interoperable structure. A hierarchical Unique Identifier (UID) system ensures consistent lineage and cross-layer integration across administrative, transportation, and urban asset datasets.

The approach is validated using a large-scale urban dataset from Riyadh, comprising over 10,000 nodes and 13,000 edges. Graph-based analysis demonstrates realistic spatial patterns, including right-skewed degree distribution, strong network connectivity, and identifiable community structures. These results highlight the ability of GeoGraphJSON to capture hierarchical organization and functional relationships while supporting efficient analytical workflows.

By bridging geometry-centric GIS models and graph-based approaches, GeoGraphJSON provides a scalable foundation for urban analytics, digital twins, and GeoAI applications, enabling geospatial systems to evolve from static representations toward intelligent, relationship-aware spatial models.

2:00pm - 2:15pm

Urban Morphological Clustering of Cairo, and Makkah A Comparative Analysis Using Spatial Networks

Ahmad M. Senousi^1,2, Wael Ahmed^1,2, Adel Elshazly¹, Moustafa Baraka³, Walid Darwish^1,2

¹Geomatics Engineering Lab, Public Works Department, Cairo University, Giza 12613, Egypt;; ²NAMAA for Engineering Consultations, Dokki , Giza 12612, Egypt; ³Civil Engineering Program, German University in Cairo 11835, Egypt

Urban morphology quantitatively reveals how distinct historical and functional drivers shape city form. This study employs a computational morphometric approach using the Momepy library to analyze and compare the urban structures of Cairo, Egypt, and Makkah, Saudi Arabia. These cities represent paradigmatic cases: Cairo exemplifies long-term, organic layering, while Makkah demonstrates rapid, purpose-driven transformation for religious pilgrimage. We calculated key metrics—including tessellation area, convexity, elongation, equivalent rectangular index, and edge betweenness centrality—for building footprints and street networks sourced from OpenStreetMap. Results show Cairo possesses a heterogeneous, polycentric fabric with complex plot shapes and a distributed street network, reflecting its layered history. Conversely, Makkah exhibits a more monocentric, consolidated form with standardized building geometries and a hierarchical street network channeling movement toward its core. The findings demonstrate that quantitative morphology effectively captures how Cairo's organic evolution and Makkah's centralized planning produce fundamentally different, yet equally revealing, urban spatial structures, offering a replicable framework for cross-city analysis in the region

2:15pm - 2:30pm

An Assessment of Spatiotemporal Dynamics of Urban Illumination and Socioeconomic Patterns in Delhi Using VIIRS Nighttime Light Data

Manisha Kumari¹, Aditya Kumar Thakur²

¹Tilka Manjhi Bhagalpur University, India; ²Indian Institute of Technology Roorkee, India

Urban illumination, as captured through Nighttime Light (NTL) data, serves as a powerful indicator of human activity, infrastructure development, and socioeconomic progress in rapidly growing cities. However, previous studies on Delhi have largely focused on temporal NTL trends without integrating multi-year statistical and spatial analyses to reveal underlying urban and socioeconomic dynamics. This study investigates the spatiotemporal dynamics of urban illumination and development over Delhi using VIIRS Day/Night Band (DNB) NTL data for the years 2015, 2020, and 2025. NTL intensity was used as a proxy for urbanization and socioeconomic activity. Monthly composite datasets for January of each year were processed, clipped to the Delhi administrative boundary, and analyzed using statistical, temporal, and correlation-based methods. The results revealed a slight decline in mean NTL intensity from 26.34 in 2015 to 24.95 in 2025, indicating stabilization in overall light emissions may be due to the adoption of energy-efficient technologies. However, the maximum and range values increased markedly (166.85 to 228.04), signifying intensified illumination in high-activity commercial and infrastructural zones. Temporal change analysis showed balanced positive and negative illumination shifts, with over 50% of pixels exhibiting moderate growth during 2020–2025. Strong Pearson and Spearman correlations (r = 0.83–0.92; ρ = 0.910.95) confirmed the temporal consistency of illumination distribution. The socioeconomic assessment highlighted spatial disparities in light intensity might be corresponding to varying economic activity levels. Overall, the study demonstrates that VIIRS-derived NTL data provide an effective and robust approach for monitoring urban growth, socioeconomic variability, and sustainable lighting transitions in metropolitan environments.

2:30pm - 2:45pm

Artificial Intelligence for territorial interpretation: from image clustering to perceptual mapping

Fabio Bianconi, Marco Filippucci, Chiara Mommi

University of Perugia, Italy

The research investigates artificial intelligence as a device for the automatic interpretation of landscape, reframing representation not as a neutral reproduction but as a cognitive operation in which perception, description, and evaluation converge. Moving from the assumption that landscape is not an objective given but a culturally and perceptually constructed form, the study proposes a fully data-driven methodology based on geolocated images. Through a systematic grid sampling, street-level panoramic views are collected and processed within a multimodal pipeline integrating visual analysis, language models, and multi-agent evaluation. Images are first translated into textual descriptions and semantically clustered, allowing territorial classes to emerge from the data rather than from predefined taxonomies. In parallel, a simulated cognitive framework, structured through four agent profiles, produces evaluative scores and textual judgments, later analysed through sentiment detection. The integration of these layers generates a georeferenced dataset from which a perceptual cartography of the territory is constructed. Applied to the urban context of San Giustino (Italy), the method reveals a continuous gradient from dense urban cores to rural landscapes, while exposing differentiated perceptual readings across observer profiles. Within this framework, artificial intelligence does not replace human interpretation; it operates as an epistemic extension, transforming the landscape into a distributed field of comparable perceptions, where representation becomes a computable form of shared knowledge.

2:45pm - 3:00pm

Towards the Development of a Metadata-driven Usability Awareness Prototype for Interoperable GIS Operation Design

Jumg-Hong Hong, How-Han Chang, Ting-Yi Lee, Ting-Yu Chang

Dept. of Geomatics, National Cheng Kung University, Chinese Taipei

This study focuses on bridging usability information between data providers and data users through standardized metadata. By further integrating standardized metadata with geographic information system operation design, the operations gain automated and awareness capabilities, allowing usability information based on data specifications and quality considerations to be incorporated into relevant processes, thereby avoiding erroneous decisions. The research references international standards such as ISO 19115 and ISO 19157 to meet the requirements of open geographic information technologies.

3:30pm - 5:15pm

WG II/3B: 3D Scene Reconstruction for Modeling & Mapping
Location: 715B

3:30pm - 3:45pm

3D gaussian splatting for large-scale 3D reconstruction: an evaluation and quality analysis

Jiangxue Yu¹, Yueling Liao¹, San Jiang^2,3, Xing Zhang^2,3, Zhijun Wang⁴, Qingquan Li^2,3

¹School of Computer Science, China University of Geosciences, Wuhan 430074, China; ²Guangdong Key Laboratory of Urban Informatics, Shenzhen University, Guangdong Shenzhen, 518060, China; ³MNR Key Laboratory for Geo-Environmental Monitoring of Great Bay Area, Shenzhen University, Guangdong Shenzhen, 518060, China; ⁴Guangdong Laboratory of Artificial Intelligence and Digital Economy (Shenzhen), Guangdong Shenzhen, 518060, China

Large-scale 3D reconstruction has emerged as a key research in the fields of photogrammetry and computer vision. 3D Gaussian Splatting (3DGS) has become a mainstream approach due to its efficient rendering, but it confronts critical challenges in large-scale scenarios: excessive memory overhead and inadequate geometric accuracy. Meanwhile, the traditional Structure from Motion and Multi-view Stereo (SfM-MVS) framework, despite its cumbersome process, continues to exhibit robust performance. Notably, a systematic evaluation comparing these two paradigms in large-scale scenes remains absent. To address this, we develop a unified verification framework to evaluate the texture rendering quality and geometric reconstruction precision of several recent methods using real-world datasets. The results indicate that SfM-MVS methods still maintain an advantage in the completeness and accuracy of geometric reconstruction. In contrast, 3DGS methods have achieved breakthroughs in local accuracy or rendering-geometry synergy, yet their global consistency requires further improvement.

3:45pm - 4:00pm

RobustGauss: Robust 3D gaussian splatting for distractor-free 3D scene reconstruction

Haibing Liu¹, Shihan Chen¹, Huchen Li¹, Wubiao Huang¹, Shuai Zhang¹, Fei Deng^1,2

¹School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China; ²Hubei Luojia Laboratory, Wuhan 430079, China

3DGS-based methods often render transient distractors in 3D scenes as significant floating artifacts. Existing works for removing transient distractors suffer from under-identification or over-identification, resulting in residual transient distractors affecting reconstruction quality or loss of scene information, preventing the reconstruction of fine details. To address these challenges, we propose RobustGauss. We first rely solely on the cosine similarity of DINOv2 features to robustly predict uncertainty masks and accurately identify the main regions of transient disturbances and their corresponding shadows. Due to the limited resolution of DINOv2 features, we use high-resolution image residuals to refine the edges of the initial uncertainty masks, thereby accurately identifying all transient distractors and minimizing their impact on 3D scene reconstruction. Experiments on two challenging datasets demonstrate that our method achieves state-of-the-art performance.

4:00pm - 4:15pm

BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model

Yuci Han¹, Charles Toth¹, John Anderson², William Shuart², Alper Yilmaz¹

¹the ohio state university, United States of America; ²USACE ERDC GRL

N/A

4:15pm - 4:30pm

EMVSNet: Evidential multi-view stereo reconstruction for sampling-free depth and uncertainty estimation

Christian Grannemann, Max Mehltretter

Leibniz University Hannover, Germany

We present EMVSNet, a sampling-free Multi-View Stereo (MVS) method that, to the best of our knowledge, is the first to integrate Evidential Deep Learning into MVS. Given a set of overlapping images, our method predicts a depth value together with its associated uncertainty per pixel of a reference image, incorporating uncertainty from aleatoric and epistemic sources. Specifically, we use an existing convolutional neural network architecture designed for MVS as backbone and extend it to regress evidential parameters per pixel, describing the probability distribution over the depth corresponding to this pixel. In contrast to existing MVS methods that often neglect epistemic uncertainty or obtain it via sampling at inference, our evidential formulation does not require sampling, but enables single-pass inference. We evaluate the uncertainty estimation capabilities of our method using two publicly available datasets and compare the depth predictions against a deterministic variant. The experimental results demonstrate that EMVSNet achieves competitive depth accuracy while, at the same time, providing uncertainty estimates that enable us to reliably rank depth estimates according to their risk of being incorrect and to automatically identify out of distribution data. Our model shows only slightly increased inference time compared to a deterministic baseline while giving comparable uncertainty estimates to an computationally expensive sampling based approach, marking a first step towards real-time capable uncertainty estimation for image-based 3D reconstruction.

4:30pm - 4:45pm

Adaptive Scaling with Geometric and Visual Continuity of completed 3D objects

Jelle Vermandere, Maarten Bassier, Maarten Vergauwen

KU Leuven, Belgium

Object completion networks typically produce static Signed Distance Fields (SDFs) that faithfully reconstruct geometry but cannot be rescaled or deformed without introducing structural distortions. This limitation restricts their use in applications requiring flexible object manipulation, such as indoor redesign, simulation, and digital content creation. We introduce a part-aware scaling framework that transforms these static completed SDFs into editable, structurally coherent objects. Starting from SDFs and Texture Fields generated by state-of-the-art completion models, our method performs automatic part segmentation, defines user-controlled scaling zones, and applies smooth interpolation of SDFs, color, and part indices to enable proportional and artifact-free deformation. We further incorporate a repetition-based strategy to handle large-scale deformations while preserving repeating geometric patterns. Experiments on Matterport3D and ShapeNet objects show that our method overcomes the inherent rigidity of completed SDFs and is visually more appealing than global and naive selective scaling, particularly for complex shapes and repetitive structures.

4:45pm - 5:00pm

MambaPanoptic: a Vision Mamba-based Structured State Space Framework for panoptic Segmentation

Qing Cheng^1,2, Damiano Bertolini^1,3, Wei Zhang⁴, Dong Wang⁵, Niclas Zeller⁶, Daniel Cremers^1,2

¹Technical University of Munich, Germany; ²Munich Center for Machine Learning; ³Polytechnic University of Milan; ⁴University of Stuttgart; ⁵Wuhan University; ⁶Karlsruhe University of Applied Sciences

Panoptic segmentation requires the simultaneous recognition of countable thing instances and amorphous stuff regions, placing joint demands on long-range context modelling, multi-scale feature representation, and efficient dense prediction. Existing convolutional and transformer-based methods struggle to satisfy all three requirements concurrently: convolutional architectures are limited in their capacity to model long-range dependencies, while transformer-based methods incur quadratic computational cost that is prohibitive at high resolutions. In this paper, we propose MambaPanoptic, a fully Mamba-based panoptic segmentation framework that addresses these limitations through two principal contributions. First, we introduce MambaFPN, a top-down feature pyramid that leverages Mamba blocks to generate globally coherent, multi-scale feature representations with linear computational complexity. Second, we adopt a PanopticFCN-style kernel generator that produces unified thing and stuff kernels for proposal-free panoptic prediction, enhanced by a QuadMamba-based feature refinement module applied at multiple network stages. Experiments on the Cityscapes and COCO panoptic segmentation benchmarks demonstrate that MambaPanoptic consistently outperforms PanopticDeepLab and PanopticFCN under comparable model sizes, and matches or surpasses Mask2Former on Cityscapes in PQ and AP while requiring fewer parameters.

5:00pm - 5:15pm

GeoPrior-Diff: Using Stable Diffusion as a geometric Prior for single-view 3D Point Cloud Reconstruction

Youssef Korny¹, Sunghwan Yoo¹, Mohammad Moein Sheikholeslami¹, Daniel Panangian², Ksenia Bittner², Andreas Wichmann³, Gunho Sohn¹

¹Dept. of Earth and Space Science and Engineering, York University, Canada; ²Remote Sensing Technology Institute, German Aerospace Center (DLR), Germany; ³Institute for Applied Photogrammetry and Geoinformatics (IAPG), Jade University of Applied Sciences, Germany

Single-view 3D reconstruction from monocular aerial imagery presents a fundamental challenge in remote sensing due to the inherent scale ambiguity and the complex geometry of urban environments. Traditional regression-based methods often struggle to recover high-frequency structural details, leading to over-smoothed or noisy outputs. To address this, we introduce GeoPrior-Diff, a novel two-stage framework that leverages the generative capabilities of Latent Diffusion Models to reconstruct high-fidelity 3D point clouds.

Unlike direct generation approaches, our method explicitly bridges the domain gap between 2D texture and 3D structure by utilizing an intermediate geometric prior. In the first stage, we predict an oblique normal map from the input satellite imagery, capturing essential surface orientation and structural boundaries. In the second stage, this normal map serves as a strong conditioning signal for a probabilistic diffusion model, guiding the denoising process to synthesize accurate 3D point clouds. Preliminary results demonstrate that decoupling geometric estimation from point generation significantly enhances structural consistency and reduces artifacts compared to baseline methods. This work highlights the potential of using generative priors for robust 3D urban modeling from limited data.

Date: Tuesday, 07-July-2026

8:30am - 10:00am

WG IV/1C: Spatial Data Representation and Interoperability
Location: 715B

8:30am - 8:45am

Hierarchical Polygon-to-Point Collapsing for Multi-Scale Representation Based on the Straight Skeleton and Dual Half-Edge Data Structure

Amin Gholami¹, Pawel Boguslawski¹, Martijn Meijers²

¹Wroclaw University of Environmental and Life Sciences, Institute of Geodesy and Geoinformatics, Grunwaldzka 53, 50-357 Wroclaw, Poland; ²GIS Technology, Faculty of Architecture and the Built Environment, Delft University of Technology, Julianalaan 134, 2628 BL Delft, The Netherlands

This paper presents a hierarchical method for collapsing a polygon to point within a structured multi-scale representation. The approach is based on the straight skeleton, which drives the shrinking process through event-based transformations such as edge and split events. These events define how the polygon changes during collapse and produce a hierarchy of intermediate geometric states between the initial polygon and the final point.

The resulting hierarchy is integrated into a Dual Half-Edge (DHE) structure, where the primal space represents successive geometric states and the dual space represents the hierarchical relations between them. This produces a connected 2D+1D representation in which the third dimension corresponds to scale rather than physical height. The resulting model is interpreted as a LoD Transition Space (LTS), allowing the full polygon-to-point transition to be represented continuously across scale.

The proposed framework contributes to model-based multi-scale representation by explicitly linking geometric transformation, topological change, and hierarchical structure within a unified representation. In addition to its relevance for vario-scale cartography and generalisation, the method also has potential applicability in domains where gradual geometric transformation is required, such as procedural modeling, animation, and related geometric applications.

8:45am - 9:00am

The Research on Renewal Theory and Method for the CGCS2000 Reference Framework

Zhihao Jiang, Ju Bai, Hao Yu, Yunlu Peng, Linlin Che, He Zhang

National Geomatics Center of China

The CGCS2000 (China Geodetic Coordinate System 2000) reference framework, which has been employed since July 1, 2008 is based on the ITRF97 reference framework and only meets the application requirements of China's regional. With the sustained development of China's economy and society, and the globalization of the applications of BeiDou navigation satellite system (BDS), there is a need to establish global CGCS2000 reference framework. This paper studies mathematical method for construction Global CGCS2000 reference framework, the theory and algorithm of two-step method with the inner constraints theory is analysed. The constraint conditions of coordinate reference are redefined according to the minimum standard of frame transition parameters and rate variation. As a result, the adjusted network enjoys the highest degree of fitting to the shape of the initial network and maintain the inherent purity of the coordinate network using different observation technologies, this research result can improve the basic theory of terrestrial reference framework determination, and provide scientific methods for the globalization of the CGCS2000.

9:00am - 9:15am

Open Source 3D Cadastre Visualisation Pipeline

Pavan Sai Goud Goddu, Sisi Zlatanova, Mohsen Kalantari

University of New South Wales, Australia

Interpreting multi-storey property rights is difficult when information is scattered across 2D plans and text or locked inside desktop projects. We present a web-based pathway that communicates strata lots and common property consistently across levels in a standard browser. Aligned with the 3D Cadastral Survey Data Model and Exchange (3D CSDM) of Australia, we propose an open-source, web-first approach. The method couples a lightweight browser viewer (level/tenure filters, plan overlay, search, readable legend) with an explicit conversion step that standardises common GIS inputs into a fixed core JSON profile, with limited official CSDM-aligned JSON-LD hooks applied only to selected keys that have exact matches in the published vocabularies. Using a New South Wales case study, we evaluated the viewer against ISO 9241-11 criteria (effectiveness, efficiency). Across repeated trials (cache disabled/enabled), mean page-open times were 0.60 s (Chrome) and 1.48 s (Edge); interaction averaged 50–60 FPS; level filters applied in 40–55 ms; all five tasks succeeded. Practically, this delivers fast, consistent 3D communication of lots and common property without installs, lowering access barriers for agencies and owners while aligning with 3D CSDM’s web-first direction. Next, we will finalise parity between Upload-and-View and the Reference Viewer and add a light in-viewer validation panel.

9:15am - 9:30am

Shadow Geometric Analysis Utilising CityGML Models and FME

Pawel Boguslawski¹, Malgorzata Jarzabek-Rychard¹, Stanislaw Biernat²

¹Wroclaw University of Environmental and Life Sciences, Poland; ²infoSolutions Sp. z o.o.

This research presents a methodology for conducting shadow geometric analysis, specifically the shadow boundary in an urban model. Input data include a georeferenced CityGML LoD2 and terrain model. Additional land cover data is used to exclude some parts of the model from analysis. Shadow computation is based on a sunray vector, which is computed based on the sun position on the given day and time. The geometry of original models are divided into parts classified as either exposed to the sun or shaded. It can be used for analytical purposes in other applications, such as urban planning, energy assessment, and photovoltaic potentiality analysis, by accurately identifying sunlit and shaded areas within 3D city models. The analysis is performed in the FME software package, which is a general-purpose ETL tool.

9:30am - 9:45am

Software Development for Producing Texture Images Mapped on a Building Surface of a 3D City Model Using Aerial Images

Ryuji Matsuoka, Masato Ishikawa, Tomoaki Inazawa, Yoshihiko Nakanishi, Futa Kawamata, Masahito Takada, Takuya Danjo

Kokusai Kogyo Co., Ltd., Japan

It is desirable that a 3D city model at level of detail 2 (LOD2) has texture images mapped on building surfaces. Owing to the cost of image collection, it would be the best way to use aerial images for texture mapping at present. Although aerial oblique images provide higher-resolution texture images, using aerial oblique images has a major issue of occlusion. Accordingly, we develop software for texture mapping to a 3D city model using aerial nadir and oblique images, aiming to minimize the impact of occlusion. The software designed to be used in ordinary operation includes the features of automatically detecting occlusions on building surfaces within images by utilizing the geometry of a 3D city model and automatically selecting appropriate oblique and nadir images for texture mapping. The major feature of the developed software is its ability to process grid by grid on a building surface. The validation experiment results confirm the software's satisfactory performance in practice. Moreover, the experiment results indicate that the performance of the software depends on the ability of a 3D city model to represent buildings. Since we have recognized that it would be effective if each pixel of a texture image has its own resolution, we plan to modify the software so that each pixel can have its own resolution.

9:45am - 10:00am

Automatic detection and condition assessment of agricultural plastic greenhouses using deep learning and aerial rgb images

Davoud Omarzadeh¹, Mehran Alizadeh Pirbasti², Hamed Bahrevar³, Hoda Khalaghi⁴, Gavin McArdle², Bahram Salehi⁵

¹Institut d’Estudis Espacials de Catalunya (IEEC), Barcelona, Spain.; ²School of Computer Science, University College Dublin, Dublin, Ireland.; ³University of Tabriz, East Azerbaijan, Iran.; ⁴Universitat Autònoma de Barcelona, Barcelona, Spain.; ⁵State University of New York College of Environmental Science and Forestry (SUNY ESF), Department of Environmental Resources Engineering, Syracuse, USA.

Rapid urbanization in developing countries such as Iran has intensified pressure on agricultural land, highlighting the need for sustainable and efficient food production systems. Agricultural Plastic Greenhouses (APGs) have become a scalable alternative by enabling year-round cultivation and optimized land utilization. However, their rapid expansion necessitates continuous monitoring to evaluate structural integrity and environmental impacts, including soil degradation, plastic waste accumulation, and water consumption. This study presents a deep learning-based framework for the automated detection and condition assessment of APGs using 0.5~m resolution Google Earth imagery across four major agricultural regions in Tehran County: Pakdasht, Qarchak, Pishva, and Varamin. The proposed pipeline integrates YOLOv11 for precise APG segmentation with a U-Net architecture employing a MobileNetV2 backbone for classifying damaged and intact structures. Out of 158,912 analyzed image tiles, 6,835 contained APGs, covering an estimated area of 18.73~km\textsuperscript{2}. Among these, 1,863 damaged structures were identified, predominantly located in Pakdasht and Pishva. Around 20\% of the annotated greenhouses were verified on-site, improving labeling reliability, and the relatively standardized design of APGs in Iran suggests the model could generalize to similar regions, with minor fine-tuning using local samples if necessary. GIS-based spatial analysis further delineated potential plastic waste risk zones, supporting targeted environmental management. Comparison with government statistics and Sentinel-2 imagery from 2021 and 2024 revealed a continued shift toward greenhouse farming in response to declining cropland availability. The proposed framework provides a scalable and replicable tool for periodic APG monitoring, facilitating data-driven policymaking and sustainable agricultural planning.

1:30pm - 3:00pm

WG II/3C: 3D Scene Reconstruction for Modeling & Mapping
Location: 715B

1:30pm - 1:45pm

CityLangSplat: Integrating CityGML Semantics into 3D Language Gaussian Splatting for Urban Scene Understanding

Qilin Zhang^1,2,3, Jinyu Zhu¹, Olaf Wysocki⁴, Boris Jutzi^1,3

¹Technical University of Munich; ²Munich Center for Machine Learning; ³Karlsruhe Institute of Technology; ⁴University of Cambridge

Combining visual semantics with language representations has made 3D interpretation more flexible and intuitive. Recent advances in Gaussian Splatting extend this to efficient 3D language fields supporting open-vocabulary queries. However, existing approaches show limited generalization in large urban scenes, especially for detailed building segmentation. Semantic 3D city models such as CityGML, by contrast, provide hierarchical and geometry-aligned structural semantics that complement appearance driven visual cues. We introduce CityLangSplat, which integrates CityGML semantics into 3D Language Gaussian Splatting for urban environments. CityLangSplat rasterizes CityGML into pixel-aligned semantic maps, extracts vision-language features from SAM-derived segments and CityGML regions, and compresses both sources into a shared latent space via a lightweight autoencoder. 3D Gaussians are then optimized with a coverage-aware loss that balances accurate, building-focused CityGML supervision with broader SAM supervision, enabling geometry-aligned open-vocabulary reasoning in urban scenes. Experiments on TUM2TWIN and ZAHA datasets show consistent gains over LangSplat, with relative improvements of 22.9% in 2D and 15.1% in 3D evaluation while preserving real-time rendering. CityLangSplat provides a practical framework for combining semantic city models with language-embedded 3D Gaussian Splatting for geometry-aligned urban scene interpretation. Code will be released at https://github.com/zqlin0521/CityLangSplat.

1:45pm - 2:00pm

RoofVIP benchmark dataset: 2D roof planar polygons and very high-resolution digital orthophoto pairs

Chaikal Amrullah, Daniel Panangian, Guneet Mutreja, Youssef Abdelhedi, Ksenia Bittner

German Aerospace Center (DLR), Germany

Accurate building roof modeling is fundamental to urban analytics, digital twins, and 3D city reconstruction; however, progress in deep learning–based reconstruction is constrained by the limited availability of diverse, high-resolution datasets with detailed geometric annotations. This study introduces the RoofVIP dataset, a large-scale benchmark featuring very high-resolution RGB orthophotos paired with 2D roof vectors that capture diverse urban morphologies across Munich, Germany. Following Level of Detail (LoD) 2.0 principles, RoofVIP encompasses a wide range of roof geometries and architectural complexities, enabling evaluation of both segmentation- and vectorization-based reconstruction methods. Two paradigms are examined: a two-step segmentation-based approach (Cascade Mask R-CNN, Mask R-CNN, SOLOV2, YOLACT) and a one-step direct vector prediction approach (HEAT, PolyRoof). ImageNet-pretrained region-based models, particularly Mask R-CNN and Cascade Mask R-CNN, achieve the highest segmentation accuracy, effectively delineating complex roof boundaries while revealing limitations on small or irregular structures. Geometry-based models show complementary strengths, with HEAT emphasizing topological regularity and PolyRoof focusing on geometric precision. Although performance is lower than on simpler datasets such as HEAT and Roof Intuitive, RoofVIP exposes challenges related to geometric diversity and scale variation, serving as a rigorous benchmark. The dataset includes predefined training, validation, and test splits, enabling consistent benchmarking across methods. By providing a challenging and diverse geometric landscape, RoofVIP aims to advance geometry-aware deep learning approaches and support scalable, high-fidelity 3D urban modeling. The dataset is publicly available through the project page at https://chaikalamrullah.github.io/RoofVIP/.

2:00pm - 2:15pm

Evaluating 3D Scene Representations for Aerial Photogrammetry across Diverse Cityscapes

Shihan Chen¹, Zhaojin Li², Qingsong Yan¹, Haibing Liu¹, Huchen Li¹, Wubiao Huang¹, Fei Deng^1,3

¹School of Geodesy and Geomatics, Wuhan University, Wuhan, China; ²Technology and Engineering Center for Space Utilization, University of Chinese Academy of Sciences, Beijing, China; ³Hubei Luojia Laboratory, Wuhan, China

The proliferation of continuous Neural Radiance Field (NeRF) and 3D Gaussian Splatting (3DGS) has shifted the paradigm of 3D aerial reconstruction from relying solely on geometric stereo matching to inverse rendering optimization. However, while these emerging rendering-based frameworks excel in synthesizing photo-realistic novel views, their capability to extract accurate surfaces in complex aerial scenarios remains ambiguous compared to traditional methods. To establish a clearer understanding, this study presents a comprehensive evaluation of five representative frameworks spanning traditional Structure from Motion (SfM), purely Signed Distance Field (SDF) representations, unstructured 3D Gaussians, hybrid voxel-Gaussians, and strictly explicit sparse voxels. By systematically standardizing identical computational environments, inputs, and unified mesh-extraction pipelines on both real-world airborne LiDAR datasets and synthetic cityscapes, we assess their performance regarding visual fidelity, geometric accuracy, and resource efficiency. The experimental results reveal that while traditional MVS produces the highest overall geometric precision by strictly enforcing multi-view rigid geometry, it is prone to failures in texture-less regions. Among rendering-based representations, a fundamental trade-off exists: highly flexible, unstructured 3DGS achieve highest visual scores but degrade the underlying geometric surfaces; conversely, explicitly structured techniques demonstrate distinct superiority in regularizing topological coherence and floating artifact suppression. Furthermore, we observe that integrating structured voxels avoids the severe memory bottlenecks associated with extracting geometries from chaotic unorganized primitives. These empirical findings emphasize that for large-scale aerial photogrammetry, integrating explicit spatial structuralization into differentiable rendering pipelines is imperative for achieving scalable operations and bridging the geometric accuracy gap with traditional methods.

2:15pm - 2:30pm

Development of a 3D City Model-Based System for Pre-Flight Evaluation and Optimization of Aerial Image Acquisition Plans

Lixian Zhao, Saki Kato, Kenta Imai, Maki Itazu, Aya Matsui

Kokusai Kogyo Co., Ltd., Japan

In dense urban environments, aerial image acquisition often suffers from occlusions and redundant data due to the lack of quantitative evaluation tools at the flight-planning stage. To address this issue, this study develops a flight-planning support system that enables pre-acquisition visibility analysis for both terrain and building surfaces using existing 3D city models. The system performs ray-casting simulations based on user-defined flight parameters to quantify and visualize occluded and visible regions before flight, allowing planners to evaluate data quality and optimize image acquisition efficiency. Experiments were conducted using real flight plans with two representative aerial cameras: the Leica CityMapper-2 for multi-directional texture mapping and the Vexcel UltraCam Eagle 4.1 for nadir-based topographic mapping. The results show that the system effectively visualizes occlusions on roofs and walls, predicts building lean in nadir imagery, and assesses the influence of overlap ratios on ground visibility. These analyses enable users to design more cost-effective and geometrically consistent flight plans by identifying redundant overlaps and ensuring sufficient coverage for DSM and true-orthophoto generation. The proposed framework provides a quantitative and objective approach to improving the transparency and reliability of aerial survey planning, and it offers a foundation for integrating visibility simulation with subsequent photogrammetric workflows such as surface reconstruction and texture mapping.

2:30pm - 2:45pm

Image LiDAR based change detection and updating for urban 3D reconstruction

Teng Wu, Bruno Vallet

Univ Gustave Eiffel, Géodata Paris, IGN, LASTIG, F-77454 Marne-la-Vallée, France

There is a high demand for accurate and up-to-date territorial digital twins for human activities, but their production and updating costs remain prohibitive for many applications. Their generation relies on acquiring LiDAR and/or image data over the territory of interest. Each modality has its advantages: LiDAR is more accurate but more costly, while images are noisier but less costly and more easily accessible. Combining these two technologies to produce and update digital twins is thus a promising avenue.In this paper, we propose a pipeline based on 3D change detection to update a LiDAR point cloud using newer aerial imagery. First, triangle meshes are generated from LiDAR data and image-based dense matching. Then, 3D ray tracing is used to detect changes. After removing the changed parts, the point clouds are fused to update the scene.The proposed method is demonstrated on two datasets in France.The code will be open source on Github: https://github.com/whuwuteng/ChangeUpdateJN.

2:45pm - 3:00pm

SF-Recon: Simplification-Free Lightweight Building Reconstruction via 3D Gaussian Splatting

Zihan Li, Tengfei Wang, Wentian Gan, Hao Zhan, Xin Wang, Zongqian Zhan

School of Geodesy and Geomatics, Wuhan University, China PR.

Lightweight building surface models are crucial for digital city, navigation, and fast geospatial analytics, yet conventional multi-view geometry pipelines remain cumbersome and quality-sensitive due to their reliance on dense reconstruction, meshing, and subsequent simplification. This work presents SF-Recon, a method that directly reconstructs lightweight building surfaces from multi-view images without post-hoc mesh simplification. We first train an initial 3D Gaussian Splatting (3DGS) field to obtain a view-consistent representation. Building structure is then distilled by a normal-gradient–guided Gaussian optimization that selects primitives aligned with roof and wall boundaries, followed by multi-view edge-consistency pruning to enhance structural sharpness and suppress non-structural artifacts without external supervision. Finally, a multi-view depth–constrained Delaunay triangulation converts the structured Gaussian field into a lightweight, structurally faithful building mesh. Based on a proposed SF dataset, the experimental results demonstrate that our SF-Recon can directly reconstruct lightweight building models from multi-view imagery, achieving substantially fewer faces and vertices while maintaining computational efficiency.

3:30pm - 5:15pm

WG II/3D: 3D Scene Reconstruction for Modeling & Mapping
Location: 715B

3:30pm - 3:45pm

CARS: A Photogrammetric Pipeline for Global 3D Reconstruction using Satellite Imagery

David Youssefi¹, Valentine Bellet¹, Yoann Steux², Mathis Roux², Cédric Traizet², Marian Rassat², Tommy Calendini²

¹CNES, France; ²CS GROUP, France

We present CARS, a multiview stereo pipeline developed by CNES. This pipeline will be integrated into the CO3D mission processing chain, a mission whose goal is to generate a 3D model of the Earth in less than four years. Because this is an operational mission involving massive production, particular attention has been paid to ensuring that the software is robust, efficient and includes a set of advanced automatic processing features. The paper will provide a comprehensive overview of all the features developed since its creation to achieve this goal.

3:45pm - 4:00pm

SatGeo-NeRF: Geometrically Regularized NeRF for Satellite Imagery

Valentin Wagner¹, Sebastian Bullinger¹, Michael Arens¹, Rainer Stiefelhagen²

¹Fraunhofer Institute of Optronics, System Technologies and Image Exploitation (IOSB); ²Karlsruhe Institute of Technology (KIT)

We present SatGeo-NeRF, a geometrically regularized NeRF for satellite imagery that mitigates overfitting-induced geometric artifacts observed in current state-of-the-art models using three model-agnostic regularizers. Gravity-Aligned Planarity Regularization aligns depth-inferred, approximated surface normals with the gravity axis to promote local planarity, coupling adjacent rays via a corresponding surface approximation to facilitate cross-ray gradient flow. Granularity Regularization enforces a coarse-to-fine geometry-learning scheme, and Depth-Supervised Regularization stabilizes early training for improved geometric accuracy. On the DFC2019 satellite reconstruction benchmark, SatGeo-NeRF improves the Mean Altitude Error by 13.9% and 11.7% relative to state-of-the-art baselines such as EO-NeRF and EO-GS.

4:00pm - 4:15pm

HDR Radiance Learning and Shadow Regularization for Satellite NeRF 3D Reconstruction

Yongjun Song, Pablo d’Angelo

German Aerospace Center (DLR), Germany

High dynamic range (HDR) variations in satellite optical imagery arise from extreme differences in surface reflectance and illumination conditions. Conventional satellite NeRF frameworks are typically trained on tone-mapped or radiometrically enhanced images, where nonlinear preprocessing alters the physical relationship between measured pixel values and true scene radiance. This leads to biased photometric optimization and loss of geometric fidelity, especially under strong illumination contrasts. To address these limitations, we propose an HDR-consistent learning framework that integrates RawNeRF-style radiance supervision with shadow regularization. The method trains directly on raw satellite imagery using a logarithmic, tone mapping–aware loss that preserves linear radiance and stabilizes optimization under high dynamic range conditions. In parallel, a soft shadow regularization constrains network-predicted shadows using geometric cues derived from solar ray casting, promoting physically consistent irradiance decomposition. Experiments on four AOIs from the DFC2019 dataset demonstrate that HDR-aware radiance learning significantly improves DSM accuracy by maintaining linear radiometric consistency. The proposed shadow regularization also improves geometric consistency in structure-dominated urban scenes, although its effect is limited in vegetation-dominant areas where shadow cues are less informative. Although performance gains are smaller in vegetation-dominant areas, the results confirm that combining HDR radiance learning with geometric shadow regularization yields more radiometrically consistent and geometrically accurate 3D reconstruction from satellite imagery.

4:15pm - 4:30pm

EOGS++: Earth Observation Gaussian Splatting with Internal Camera Refinement and Direct Panchromatic Rendering

Pierrick Bournez¹, Luca Savant Aira², Thibaud Ehret³, Gabriele Facciolo¹

¹Universite Paris-Saclay, CNRS, ENS Paris-Saclay, Centre Borelli, 91190, Gif-sur-Yvette, France; ²Politecnico di Torino, Corso Duca degli Abruzzi, 24, 10129 Torino TO, Italia; ³AMIAD, Pôle Recherche, France

Recently, 3D Gaussian Splatting has been introduced as a compelling alternative to NeRF for Earth observation, offering competitive reconstruction quality with significantly reduced training times.

In this work, we extend the EOGS framework to propose \namemodel, a novel method tailored for satellite imagery that directly operates on raw high-resolution panchromatic data %and multispectral data

without requiring external preprocessing.

Furthermore, we embed bundle adjustment directly within the training process with optical flow techniques, avoiding reliance on external optimization tools while improving camera pose estimation.

We also introduce several improvements to the original implementation, including early stopping and TSDF post-processing, all contributing to sharper reconstructions and better geometric accuracy.

Experiments on the IARPA 2016 and DFC2019 datasets demonstrate that EOGS++ achieves state-of-the-art performance in terms of reconstruction quality and efficiency, outperforming the original EOGS method and other NeRF-based methods while maintaining the computational advantages of Gaussian Splatting. Our model demonstrates an improvement from 1.33 to 1.19 mean MAE errors on buildings compared to the original EOGS models.

4:30pm - 4:45pm

Evaluating multi-view geometry for satellite-based 3D city modeling: towards 1+N constellation configurations

Xu Cheng, Xianfeng Huang, Yingdong Pi, Xinsheng Wang, Mi Wang

State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, 430079, China

The emergence of satellite constellations enables near-synchronous multi-view optical imaging, offering new opportunities for large-scale 3D city modeling. Yet a practically promising configuration, in which a primary near-nadir view is complemented by multiple oblique side-looking viewpoints, remains under-examined. This study develops a controlled semi-simulation framework to analyze how multi-view imaging geometry affects the recoverability of urban 3D structures. Under idealized conditions with imaging perturbations removed, e.g., radiometric, illumination, and sensor model errors, the experiments focus on three practical factors: the number of side-looking views, view obliqueness, and the constellation’s azimuthal orientation relative to the scene. With parameter sweep analysis, it reveals an asymmetric U-shaped trend between reconstruction performance and both the view count and the obliqueness: moderate angular diversity markedly strengthens urban scene recoverability. In contrast, large obliqueness reduces inter-view overlap and destabilizes matching, while excessive redundancy introduces consistency issues that ultimately degrade reconstruction performance. Furthermore, the results shows that geometric accuracy, completeness, and texture appearance each peak at different parameter combinations, revealing intrinsic trade-offs in multi-view urban reconstruction, as different evaluation criteria favor distinct optimal configurations. The study provides practical guidance for the geometric design and mission planning of multi-satellite constellations aimed at improving satellite-based 3D modeling in urban areas.

4:45pm - 5:00pm

Illumination-prior-based high-resolution DEM reconstruction from single-view lunar image constrained with initial DEM

Siyi Qiu¹, Zhen Ye^1,2, Rong Huang^1,2, Yusheng Xu^1,2, Xiaohua Tong^1,2

¹College of Surveying and Geoinformatics, Tongji University, Shanghai, China; ²The Shanghai Key Laboratory of Space Mapping and Remote Sensing for Planetary Exploration, Shanghai, China

This work presents an illumination-prior-based reconstruction model for high-resolution DEM generation from single-view lunar imagery, developed for the extreme illumination conditions and rugged terrain of the lunar south pole. The model integrates an initial DEM prior with multi-scale monocular image features and incorporates illumination priors derived from solar geometry to enhance stability in shadowed, low-texture, and terrain-transition regions. Through cross-modal feature fusion, it effectively aligns geometric structure with shading and photometric cues, enabling accurate recovery of fine-scale topography even when visual information is severely degraded. Experimental evaluations across multiple south-polar regions show that the proposed reconstruction model outperforms existing deep learning approaches and the classical Shape-from-Shading method in elevation, slope, and aspect accuracy, with independent validation using LOLA laser altimetry points confirming its improved geometric reliability. Visual comparisons demonstrate clear advantages in reconstructing crater rims, steep slopes, and permanently shadowed areas where conventional methods often fail or produce blurred terrain structures. The model also maintains robust performance under varying solar azimuths, highlighting the effectiveness of incorporating illumination priors to improve generalization in challenging environments. Overall, the proposed reconstruction model provides a reliable and effective solution for detailed lunar terrain recovery from monocular images and offers valuable support for scientific investigation, resource assessment, landing-site evaluation, and mission planning in the lunar south polar region.

5:00pm - 5:15pm

Construction of Control Network for Multi-temporal LRO NAC Images Based on Matching of Lunar Impact Craters

Pengying Liu^1,2, Jiayao Wang^1,2, Xun Geng^1,2, Zhen Peng^1,2, Jin Wang^1,2, Haoyu Zhang^1,2

¹State Key Laboratory of Spatial Datum, Faculty of Geographical Science and Engineering, Henan University, Zhengzhou, China; ²College of Geographic Sciences, Henan University, Zhengzhou, China

To address the critical demand for high-precision mapping of the Lunar South Pole (LSP)—a region pivotal for deep space resource utilization yet plagued by extreme illumination variations, extensive permanent shadow regions (PSRs), and weak texture—this study proposes a control network construction method for multi-temporal Lunar Reconnaissance Orbiter (LRO) Narrow Angle Camera (NAC) images, anchored in lunar impact crater matching. Leveraging the morphological stability and spatial consistency of impact craters, we first created a dedicated dataset: 94 multi-temporal LSP orthophotos (1 meter/pixel resolution) with manual annotations, allocating 70% for YOLOv8 model training and 30% for validation to ensure accurate crater detection (extracting center coordinates and semi-major/semi-minor axes). For virtual feature point matching, we integrated crater geometric attributes (coordinates, aspect ratio) and inter-crater topological relationships (distance, azimuth angle) to build local descriptors, enhanced by KD-tree indexing for efficient neighborhood queries, multi-attribute similarity measurement, and bidirectional voting to eliminate mismatches. For large craters, normalized cross-correlation (NCC) was used for secondary matching to refine accuracy. Post-matching, tie points were back-projected from orthophoto to original image space via ground coordinates. Experiments on 1,208 LRO NAC images showed the method outperforms SIFT and SuperPoint: it generated 938,029 tie points (even in dark shadows) with 2,347,629 measurements, and bundle adjustment achieved a sigma naught of 0.68. This work enables automatic high-quality control network construction, supporting reliable LSP topographic mapping for deep space exploration.

Date: Wednesday, 08-July-2026

8:30am - 10:00am

WG II/3E: 3D Scene Reconstruction for Modeling & Mapping
Location: 715B

8:30am - 8:45am

Technical Scheme for 3D Digital Map Production Based on the SSW Vehicle-mounted LiDAR Mobile Mapping System (VMMS)

You Jia, Eryan Zhang, Junyong Hu, Yang Jia, Weigang Liu

Shaanxi TIRAIN Science & Technology Co., Ltd., People's Republic of China

To meet the growing demand for 3D digital map applications and to better understand the multi-level spatial structure of cities, some cities have implemented citywide 3D digital map programs. In 3D digital map production, vehicle-mounted mobile surveying is a key component. Drawing with a practical project, this paper proposes a technical scheme for road data acquisition and processing based on the SSW VMMS (Vehicle-mounted Mobile Mapping System). Through integrated processing steps, including combined navigation solution, point cloud correction, image coordinate calculation, image deblurring, point cloud coloring, point cloud denoising, and Orbit GT data preparation, the rapid production of colored point cloud data with georeferenced coordinates, 360° panoramic image data, and individual image data is achieved. A technical scheme suitable for 3D digital map production along urban roads was developed and validated. The results produced by this scheme have passed inspection and acceptance, and were released to the public free of charge as the first batch of visualized 3D map data on the Common Spatial Data Infrastructure Portal (portal.csdi.gov.hk), receiving widespread attention and positive recognition from various sectors of society. This scheme not only promotes the broader application of the SSW VMMS but also provides effective reference for similar urban vehicle-mounted mobile mapping projects.

8:45am - 9:00am

Road Network Vectorization With Geometric Enforcement

Zhenyu Zhu^1,2, Elena Di Bernardino², Florent Lafarge¹

¹Inria, France; ²Université Cote d'Azur, France

We present an automatic algorithm for graph-based road network extraction from remote sensing images. While existing works mostly focus on improving accuracy, we address the problem of the geometric quality of the output graphs. The state-of-the-art methods largely overlook this aspect by generating graphs without strong geometric guarantees, regularity preservation and low-complexity, which, ultimately, reduces their impact in many application scenarios. Our algorithm relies upon foundation models that analyze road networks with pixel-based representations, as well as geometric algorithms and data structures in charge of connecting geometric primitives into planar graphs. This hybrid strategy allows us to strongly enforce the geometric quality of the output graphs while bringing a high level of generalization. We show the potential of our algorithm and its advantages over existing methods on two datasets commonly-used in the field using both the conventional accuracy metrics and new metrics introduced to measure the geometric quality of the output graphs.

9:00am - 9:15am

A practical workflow for road slopes monitoring using handled mobile mapping systems

José Miguel Gómez López, José Luis Pérez-Garía, Antonio Tomás Mozas-Calvache, Jorge Delgado-García, Diego Vico-Gacía

Universidad de Jaén, Spain

High-resolution monitoring of road infrastructure is essential for the early detection of geomorphological instabilities such as landslides and erosion. This study evaluates the performance of handled MMS under different vehicle-mounted configurations: a 2-meter survey pole versus a suction-cup mount, and varying acquisition speeds (10 and 20 km/h). Furthermore, a GNSS-denied scenario was simulated to test the robustness of SLAM-based processing. Initial results revealed significant geometric discrepancies (double-points artifacts and drift), particularly in the SLAM-only and high-speed datasets. To address this, an automated segment-based refinement workflow was developed using a ICP algorithm. The refinement successfully reduced the standard deviation to the level of the point cloud´s mean point spacing (5 cm). Comparative multitemporal analysis against UAV-LiDAR reference data confirms that the proposed refinement renders even SLAM-processed data viable for detecting centimetric terrain displacements. The findings demonstrate that while suction-cup mounting at 10 km/h is optimal, algorithmic refinement allows for reliable road slopes monitoring and change detection across all tested configurations

9:15am - 9:30am

Assessing positional accuracy of photogrammetric multi-camera systems for mapping underground utility pipelines

Luca Perfetti¹, Ahmad Elalailyi², Federica Marotta³, Francesco Fassi², Giorgio Paolo Maria Vassena¹

¹Università degli Studi di Brescia, dept. of Civil Eng., Architecture, Territory, Environment and Mathematics (DICATAM), Italy; ²Politecnico di Milano, dept. of Architecture, Built environment and Construction engineering (ABC), Italy; ³Consorzio di Bonifica di Piacenza, Italy

Underground utilities such as water pipelines and sewers are critical for urban systems, yet their management is challenging due to limited accessibility and uncertain positional data. Current inspection practices rely on robotic crawlers equipped with CCTV cameras or man-entry inspections, enabling visual documentation of structural conditions but lacking accurate georeferencing of internal points. Advanced solutions relying on panoramic imaging and IMUs offer partial 3D measurements and trajectory estimation, though accuracy remains limited by drift and environmental variability.

This study investigates the feasibility of multi-camera photogrammetry for mapping pipelines and confined underground environments and improving positional accuracy. Preliminary experiments were conducted using the Atom-Ant3D system on two test sets: (i) five pipelines of varying materials (concrete, PVC, fiberglass) and diameters (60–110 cm); and (ii) a 1.3 km water-distribution tunnel (~2 m diameter) prepared with 28 fixed targets measured via total station for accuracy evaluation. Data were acquired using robotic and handheld configurations and processed through two workflows: Structure-from-Motion (SfM) and multi-view V-SLAM.

Accuracy assessment focused on the tunnel test, comparing unconstrained and constrained trajectories against a reference solution. Results provide insights into the potential of photogrammetric approaches for precise pipeline reconstruction and georeferencing, supporting improved subsurface utility management and planning.

9:30am - 9:45am

Beyond Centers: Bounding-Box Voxel Projection for Multi-View 3D Detection and Tracking

Rasho Ali, Max Mehltretter, Christian Heipke

Leibniz university hannover, Germany

3D multi-view, multi-object tracking (3D MV-MOT) makes use of multiple cameras to reduce the number of missed detections and to mitigate occlusions. Most current 3D MV-MOT methods suffer from information loss when associating 3D locations with 2D image features via a 3D-to-2D projection, as they use a discrete grid in 3D and sample image features only at the projected centers of each grid cell. Thus, all other feature information is lost. An additional information loss commonly arises during cross-view aggregation when applying max or average pooling: these methods either overemphasize a single view or treat conflicting views, that depict different entities, e.g., due to occlusions, equally.

In this work, we introduce two novel modules for 3D MV-MOT, employed to pedestrian tracking, that target these limitations:

(i) VoxROI aggregates all image features that fall within the bounding box around a voxel's projection into each respective image, instead of only sampling features at the projected voxel center.

(ii) SimFuse aggregates per-view voxel features into one coherent feature representation per voxel, using similarity weights computed from re-identification (Re-ID) features.

Subsequently, they are used to measure cross-view identity similarity. Views with higher Re-ID feature similarity receive larger weights, while inconsistent views are suppressed.

Experimental results on the WildTrack dataset confirm our method's effectiveness for multi-view pedestrian detection and tracking, reaching, and in particular in cross-view scenarios improving, the general state-of-the-art. The approach maintains strong performance across different camera configurations, demonstrating its generalization capability when training and testing on different camera setups.

9:45am - 10:00am

Fine-Grained Urban Low-Altitude Airspace Gridding with Dynamic Event Response and Vertical Air-Route Corridors

Leyang Zhao, Hang Mei, Xing Lin, Weixi Wang, Xiaoming Li, Sichang Wang, Lei Han, Yang Zhao, Ding Ma

Research Institute for Smart Cities, School of Architecture and Urban Planning, Shenzhen University,

With the rapid growth of urban low-altitude applications, traditional airspace management approaches based on simple altitude limits and static no-fly zones can no longer meet the demands of high-density and highly dynamic operations. To address this issue, this study proposes a fine-grained gridding method for urban low-altitude airspace with dynamic event response and vertical flight corridor constraints. First, a unified three-dimensional grid model is constructed on the basis of an urban 3D digital twin platform, and the grid scale and update cycle are determined by jointly considering clearance requirements and safety separation. Second, a method for injecting static and dynamic attributes is established to achieve the unified representation and continuous updating of terrain, buildings, no-fly and restricted zones, wind fields, temporary restrictions, as well as occupancy and release information within the grid. Third, fixed-geometry and dynamically open vertical flight corridors are designed to support controlled cross-layer flight transitions and reduce the risk of vertical conflict propagation. An experimental system is developed using a typical high-density urban area in Yuehai Subdistrict, Nanshan District, Shenzhen, as the case study. The results show that the proposed method can achieve stable spatial discretization, accurate attribute loading and updating, and clear organization of cross-layer flight. The proposed method provides a unified technical framework for low-altitude airspace representation, state management, and operational governance in complex urban environments.

1:30pm - 3:00pm

WG II/4B: AI/ML for Geospatial Data
Location: 715B

1:30pm - 1:45pm

From Pixels to Polylines: Extracting City-scale Vectorized Roof Structures with Line Segment Detection Networks

Mehmet Büyükdemircioğlu¹, Fabio Remondino¹, Martin Kada², Sultan Kocaman³

¹3D Optical Metrology (3DOM) Unit, Bruno Kessler Foundation (FBK), Trento, Italy; ²Technische Universität Berlin, Institute of Geodesy and Geoinformation Science, Berlin, Germany; ³GeoPlato Engineering Inc., Bilkent Cyberpark, Ankara, Türkiye

Automatic extraction of vectorized roof structures above LOD2.0 remains challenging due to their geometric complexity and the presence of small and occluded elements over the roofs. Detecting fine-scale roof objects such as chimneys and dormer windows in very high resolution aerial imagery is still an active research topic. This study presents a workflow for automated detection and vectorization roof structures at city scale using Line Segment Detection (LSD) networks. Compared to model-based building reconstruction approaches, LSD networks do not rely on pre-defined roof typologies and are able to extract complex roof structures and small objects over the building roofs. For this purpose, a dataset comprising approximately 139,000 buildings with LOD2.2 roof structures and more than 2.2 million roof segments is generated using 8 cm GSD aerial imagery. An automated end-to-end workflow is developed, trained and tested from the available data. Experimental results indicate that roof structures suitable for LOD2.2 3D roofs can be extracted and vectorized with high accuracy, achieving 58.4% msAP and 73.1% mAPJ with ULSD network. Robustness is further assessed by visual inspection in areas affected by roof-blocking objects such as trees and cast shadows.

1:45pm - 2:00pm

Automatic Large-Scale Topographic Mapping from High-Resolution Aerial Imagery

Weiqin Jiao, George Vosselman, Claudio Persello

University of Twente, ITC Faculty Geo-Information Science and Earth Observation, Netherlands, The

Topographic maps provide structured, polygonal representations of the Earth’s surface, delineating land-cover classes such as buildings, roads, water bodies, and vegetation.

They form the foundation of national geospatial data infrastructures and support a wide range of applications, including urban planning, environmental monitoring, and cadastral management. However, the production and maintenance of such large-scale topographic maps still rely heavily on manual photo-interpretation and vector editing.

While such human-in-the-loop workflows ensure geometric accuracy, they are labor-intensive, costly, and non-reproducible, limiting scalability and update frequency. However, most existing polygonal outline extraction methods are restricted to single-class, which typically leads to overlaps, gaps, and inconsistent shared boundaries when extended to multi-class mapping. Moreover, few studies have demonstrated nationwide implementation or validation, leaving the scalability and generalization of current methods largely unexplored. To address these challenges, this study develops a fully automated framework for large-scale topographic mapping directly from high-resolution aerial imagery. The framework aims to produce seamless, multi-class topographic maps in a single run that remain topologically consistent across diverse urban and rural regions in the Netherlands and beyond.

2:00pm - 2:15pm

Todo Fir Crown Instance Segmentation in dense Plantation Forest using Polar-FFT and Treetop Queries

Yusuke Iuchi¹, Soki Nishiwaki¹, Fumio Takeuchi², Mika Takiya², Takanori Emaru³

¹Graduate School of Engineering, Hokkaido University; ²Forestry Research Institute, Hokkaido Research Organization; ³Faculty of Engineering, Hokkaido University

Instance segmentation of individual trees from UAV-derived orthomosaics and DSMs remains challenging in dense planted forests in Japan because SfM-derived DSMs often have blurred crown boundaries and unstable quality. We propose a PFFT-based method that encodes the local DSM shape around treetop candidates and integrates it into Mask2Former to suppress unreliable candidates and improve crown separation.

Experiments on Abies sachalinensis plantation (Todo fir) data from two sites in Hokkaido showed that the method improved mAP75 from 52.18% to 55.47% and F1 at a confidence threshold of 0.5 from 89.86% to 92.08%, while reducing false positives by 41% without increasing false negatives. The results indicate that treetop-centered local shape cues are useful for instance segmentation in densely planted forests.

2:15pm - 2:30pm

An integrated yolo-seg and geometric analysis framework for construction zone detection and tubular marker damage assessment

Lee Jubin¹, Youn Junhee², Choi Kanghyeok³, Kim Changjae¹

¹Department of Civil and Environmental Engineering, College of Engineering, Myongji University,; ²Department of Future & Smart Construction Research, Korea Institute of Civil and Building Technology; ³Department of Geoinformatic Engineering, Inha University

This study presents an integrated framework combining YOLOv9e-Seg and photogrammetric geometric analysis for detecting road-safety assets and assessing their condition using UAV imagery. Traffic cones and tubular markers, which define construction-zone boundaries, are difficult to detect due to their small size in high-resolution images. To address this, a crop-tiling strategy (512×512 pixels) was applied to enhance the representation of small objects. Polygon-based labeling was used to preserve fine object geometry, and YOLOv9e-Seg was trained to output instance masks and polygon coordinates. During testing, tiled predictions were restored to the global coordinate frame, and duplicate detections were removed by retaining only the highest-confidence results. Geometric analysis utilized segmentation-derived polygons to compute centroids and principal axes, distinguishing intact and damaged tubular markers through vector angle difference analysis. For traffic cones, convex hulls constructed from centroid positions accurately delineated construction-zone boundaries. The proposed approach achieved the highest F1 score at a 512-pixel tile size, improving detection and segmentation of small, slender objects. These results demonstrate that the framework goes beyond basic detection and segmentation by enabling quantitative geometric interpretation and reliable construction-zone reconstruction from UAV data.

2:30pm - 2:45pm

From Aerial to Satellite: Can Super-Resolution Enable Label-Free Model Transfer?

Nina Merkle, Corentin Henry, Sandeep Kumar Jangir, Jens Hellekes, Felix Rauch, Pablo d'Angelo, Franz Kurz

German Aerospace Center (DLR), Germany

Satellite imagery enables large-scale remote sensing applications by providing frequent and large-scale coverage. However, its limited spatial resolution often restricts the use of satellite images in tasks that require detailed, fine-scale information. In contrast, aerial images offer a much higher spatial resolution, allowing the extraction of fine-grained features, but typically cover smaller, more localized areas. In this work, we investigate whether super-resolution (SR) methods can bridge the gap between aerial and high-resolution satellite imagery, enabling a label-free model transfer without additional manual annotations. The idea is to enhance the spatial resolution of high-resolution satellite images, allowing models trained on aerial data to be directly applied to satellite images. Towards this goal, a state-of-the-art SR algorithm is used to upscale three high-resolution satellite images, matching the resolution of the aerial training data. Then, a segmentation network trained on an aerial image dataset is applied to segment roads and parking areas in the super-resolved satellite images. The approach is evaluated on an annotated dataset and compared to the results in the original satellite images. Additionally, we investigate its performance on a low-resolution aerial image. Our results demonstrate that SR facilitates the utilization of models trained on aerial image datasets for large-scale satellite applications without requiring new labels.

2:45pm - 3:00pm

Beyond Vision: How Language effects Visual Grounding in UAV Imagery

Jue Chen¹, Penghui Huang², Ran Ding², Zhentao Zou², Xue Yang², Xue Jiang², Yue Zhou¹, Hongxin Yang¹, Jonathan Li^1,3

¹Hinton STAI Institute and Key Laboratory of Geographic Information Science (Ministry of Education), East China Normal University, Shanghai 200241, China; ²Shanghai Jiao Tong University, Shanghai 200241, China; ³Department of Geography and Environmental Management, University of Waterloo,Waterlo0,ON N2L 3G1,Canada

This study tackles multilingual and explicit-implicit gaps in Visual Grounding (VG) for UAV imagery, focusing on real-world UAV needs (e.g., disaster response) that require implicit reference understanding. It evaluates Qwen2.5-VL-7B’s cross-linguistic robustness via Acc@0.5% across nine languages (Chinese, English, Japanese, Russian, Korean, German, French, Spanish, Portuguese). Key results: Explicit VG (using visual attributes) outperforms implicit VG (needing context/common sense) universally. East Asian languages lead in both tasks; Indo-European languages (e.g., Portuguese, 48.63% implicit accuracy drop) lag. Attention analysis shows the model better aligns with East Asian linguistic structures. This work informs LVLM optimization for multilingual UAV applications, guiding future cross-model comparisons.

3:30pm - 5:15pm

WG III/6A: Remote Sensing of the Atmosphere
Location: 715B

3:30pm - 3:45pm

Deep Pretraining Unleashes the Potential of Aerosol Size Information Retrieval

Xing Yan

Beijing Normal University, China, People's Republic of

Aerosol size information, typically represented by fine- and coarse-mode aerosol optical depth (fAOD and cAOD), is crucial for understanding anthropogenic emissions and radiative effects. However, satellite-based retrievals suffer from limited labeled data and high uncertainty over land. To address these challenges, we developed a novel deep pretraining framework capable of mining latent representations from unlabeled satellite pixels, thereby enhancing the accuracy and generalization of aerosol size information retrieval. The framework leverages a self-supervised pretraining stage to capture intrinsic spatiotemporal correlations in multispectral satellite data and transfers these latent features to a supervised fine-tuning model. Using MODIS data combined with AERONET observations, our pretrained model achieved a 10% improvement in correlation and a 15% enhancement in regions without ground observations compared to conventional deep-learning models. The retrieved global fAOD from 2001–2020 reveals a significant decreasing trend (−1.39 × 10⁻³ yr⁻¹), with regional differences—most notably, a threefold stronger decline over China than the global average. These results demonstrate that deep pretraining can effectively exploit unlabeled satellite information, bridging the gap between sparse ground networks and dense global observations, and offering a transformative approach for large-scale aerosol characterization and climate studies.

3:45pm - 4:00pm

Retrieval of aerosol optical/microphysical parameters of FY-4A geostationary satellite based on Transformer

Siyu Liu¹, Lina Xu¹, Minghui Tao², Xincai Chang¹, Huang Zhang¹, Jianxin Ling¹, Dandi Liao¹

¹Hubei Subsurface Multi-scale Imaging Key Laboratory, School of Geophysics and Geomatics, China University of Geosciences, Wuhan, 430074, China; ²Hubei Key Laboratory of Regional Ecology and Environmental Change, School of Geography and Information Engineering, China University of Geosciences, Wuhan, 430074, China

Atmospheric aerosols are a key factor influencing the Earth's radiation balance and climate change, and the accuracy of their retrieval is crucial for environmental monitoring and climate research. FY-4A AGRI, with its high-frequency observation capability, can provide aerosol data at high temporal resolution. Combined with deep learning technology, it enables efficient monitoring of dynamic aerosol variations. This study develops a retrieval algorithm for aerosol optical and microphysical parameters based on the Transformer deep learning model, specifically designed for the FY-4A geostationary satellite. The algorithm achieves multi-parameter collaborative retrieval of aerosol optical depth (AOD), fine/coarse-mode aerosol optical depth (FAOD/CAOD), and single scattering albedo (SSA). This research overcomes the reliance on prior assumptions inherent in traditional physical retrieval methods. By integrating multi-band spectral features, geometric observation parameters, and data from 104 AERONET sites, it significantly enhances retrieval accuracy under the complex surface conditions of East Asia. Experimental results demonstrate high accuracy in validation against AERONET sites, with correlation coefficients of R=0.915 for AOD, R=0.897 for FAOD, R=0.851 for CAOD, and R=0.536 for SSA. Comparative validation of various aerosol product spatial distributions highlights the advantages of the proposed algorithm in capturing aerosol diurnal variations (such as haze dissipation processes) and extreme events (e.g., dust storms and biomass burning). This study provides a new technical approach for regional air quality monitoring and climate effect assessment, advancing the application of China’s geostationary meteorological satellites in aerosol monitoring.

4:00pm - 4:15pm

Bioaerosol-driven heavy metal deposition and Biospheric response: A remote sensing-assisted Phytoremediation study in the Pin Valley National Park, North-Western Himalayas

Abhinav Galodha¹, Deepika Sharma²

¹School of Interdisciplinary Research (SIRe), Indian Institute of Technology Delhi, IIT Delhi, India; ²Department of Botany, Himachal Pradesh University (HPU), Shimla, Himachal Pradesh, India

Heavy metal pollution presents a formidable challenge to global ecosystems, threatening biodiversity, soil and water quality, and human health. The atmosphere serves as both a source and long-range conveyor of bioaerosols, complex particles that include bacteria, fungal spores, and dust-bound heavy metals, profoundly influencing biosphere health and ecosystem function. In this study, we investigate atmosphere-biosphere interactions in Pin Valley National Park, a cold desert ecosystem in the Western Himalayas, by analyzing how bioaerosol-mediated deposition of heavy metals shapes vegetation stress and phytoremediation dynamics. Integrating field spectroscopy, in-situ chemical analysis (ICP-MS), and multi-temporal satellite data, we mapped heavy metal hotspots (Pb, Cd, Ni, Cr) and linked them to shifts in vegetation health and thermal indices. We observed significant spatial overlap between elevated metal concentrations likely introduced via long-range atmospheric transport and suppressed vegetation indices. Phytoremediator species such as Brassica juncea and Populus exhibited strong metal uptake, revealing natural biospheric buffering capacity against airborne contaminants. Additionally, iron oxide and hydrothermal indices indicated that soil mineral conditions, modulated by deposition, may influence microbial and root zone dynamics. This multidisciplinary assessment underscores the role of the atmosphere not merely as a depositor but as a dynamic bioreactor influencing terrestrial microbiomes and plant stress responses. By offering a scalable, remote sensing–assisted framework for monitoring ecosystem health and contaminant transport, our work directly supports SDG 13 by identifying atmospheric pathways of pollutant stress under warming trends, contributes to SDG 15 by protecting fragile alpine ecosystems through phytoremediation, and aligns with SDG 17 as an interdisciplinary approach.

4:15pm - 4:30pm

Assessing cross-season, AOD-PM2.5 Relationships as a Function of Meteorological Parameters in Sherbrooke, Québec, Canada

Alireza Zhaleh Doost, Norman T. O'Neill, Yacine Bouroubi, Mickaël Germain

Université de Sherbrooke, Canada

The relationship between aerosol optical depth (AOD) and surface PM2.5 concentrations remains a significant difficulty in remote sensing-based air quality assessments due to meteorological conditions and aerosol vertical structure. This relationship is investigated using daily observations from 2021 to 2024 in Sherbrooke, Quebec, Canada.

Ground-based AERONET AOD500 and satellite-based MAIAC AOD at 550 nm are analyzed separately, together with surface PM2.5 measurements from a local PurpleAir sensor. Meteorological parameters such as relative humidity, boundary layer height, temperature, and wind speed are available from ERA5 reanalysis. Vertically resolved aerosol information from MPLNET lidar is used to identify elevated aerosol layers associated with transported wildfire smoke.

The approach combines Pearson and Spearman correlations, partial correlation analysis, multivariate regression, and Random Forest (RF) modeling to capture nonlinear interactions. Results indicate weak but statistically significant correlations between AOD and PM2.5 (r ≈ 0.26-0.30), with stronger monotonic relationships. A pronounced seasonal dependence is observed, with the strongest coupling in autumn and weak or insignificant relationships in winter. Partial correlation analysis suggests that a residual association between AOD and PM2.5 remains after accounting for meteorological influences.

RF models improve predictive performance (R² ≈ 0.39), although performance degrades in winter. Sensitivity analysis indicates that transported smoke plumes can influence the AOD-PM2.5 relationship, particularly when partial mixing into the boundary layer occurs.

4:30pm - 4:45pm

First global XCO2 Observations from spaceborne Lidar

Hongyuan Zhang, Ge Han, Yiyang Huang, Wei Gong

Wuhan University, China, People's Republic of

Over the past decade, nearly ten satellites dedicated to atmospheric CO2 concentration monitoring have been launched, significantly advancing our understanding of the global carbon cycle. In 2022, China launched the DaQi-1 (DQ-1) satellite, which carries the Aerosol and Carbon Dioxide Lidar (ACDL)—the first spaceborne lidar sensor for CO2 monitoring. Relying on laser-based active sensing, ACDL can detect global XCO2 at nighttime, serving as an important complement to existing passive optical CO2 satellite missions. This study aims to introduce the scientific community to the XCO2 retrieval methodology of ACDL and its initial XCO2 product. The first version of ACDL XCO2 products scheduled for release is called “v1.0”. This paper presents a comparison between XCO2 at daytime and nighttime. Nonetheless, challenges remain, including reliance on meteorological reanalysis data and uncertainties in spectroscopic parameters. In future product versions, we plan to improve data quality through enhanced denoising techniques and signal processing methods for low signal-to-noise ratio (SNR) cases. We hope that this initial ACDL XCO2 product will spark broader interest and participation from the scientific community, thereby contributing fresh momentum to climate change research.

4:45pm - 5:00pm

Cross-city transfer learning for Sentinel-5P-driven NO2 prediction in data-sparse urban environments

Fjoralba Janku¹, Francesco Mauro¹, Luigi Russo², Babak Memar³, Alessandro Sebastianelli⁴, Paolo Gamba², Silvia Liberata Ullo¹

¹University of Sannio, Benevento, Italy; ²University of Pavia, Pavia, Italy; ³University La Sapienza, Rome, Italy; ⁴CMCC Foundation - Euro-Mediterranean Center on Climate Change, Caserta, Italy

Traditional forecasting methods of air pollutants show intrinsic limitations due to the complexity of atmospheric interactions. Recent research has moved toward the employment of artificial intelligence (AI)-based approaches and satellite data processing. The framework proposed in this study is a transfer learning (TL) model to estimate surface-level NO2 concentrations across multiple locations by using satellite and environmental data.

The approach integrates Sentinel-5P TROPOMI-derived tropospheric NO2 columns, meteorological variables (temperature, precipitation etc), spatial coordinates and temporal features.

A CatBoost regression model is implemented, leveraging a Leave-One-City-Out (LOCO) TL framework across five cities (Berlin, London, Madrid, Paris and Toronto) in the world. This enables the model transfer from multiple source domains to a new target city with minimal ground-based data. Experimental results are outperforming city-specific baseline models, by showing an increased prediction accuracy, a reduced Root Mean Square Error (RMSE) by approximately 7% and a Coefficient of Determination (R2) higher by 2.7%. Toronto, which represents an environment with a low monitoring density, benefits most from TL, with R2 improving from 0.58 (baseline) to 0.66 (transfer) and RMSE dropping from 6.44 µg/m3 to 5.84 µg/m3.

A detailed Leave-One-Block-Out (LOBO) ablation study shows how each group of features contributes to the performance of the model. Spatial coordinates and meteorological features are the most influential predictors of NO2 concentration, while the satellite NO2 data increase model generalization. These results highlight the potential of cross-city TL and remote sensing synergy for scalable urban air pollution monitoring, especially in limited ground-based monitoring scenarios.

5:00pm - 5:15pm

Enhanced Ozone Downscaling in Megacities Using a SHAP-Optimized U-Net Model

A.A. Kakroodi, Hossein Barekati, Hamid Kiavarz Moghaddam

University of Tehran, Iran, Islamic Republic of

High-resolution mapping of tropospheric ozone is essential for urban environmental assessment; however, satellite-derived ozone products are generally too coarse to capture neighborhood-scale variability in complex megacities such as Tehran. This study introduces an interpretable deep-learning framework that downscales coarse Sentinel-5P ozone observations to a 30-m spatial grid by integrating a U-Net convolutional architecture with SHapley Additive exPlanations (SHAP). A diverse suite of predictors—including land-surface indicators, meteorological parameters, terrain morphology, and chemical precursors—was harmonized and resampled to a unified spatial resolution. SHAP analysis was applied to quantify each predictor’s contribution, enabling the removal of redundant or low-impact variables before model training. Using spring 2020 as the evaluation period, the optimized U-Net successfully reconstructed fine-scale ozone gradients and reproduced Tehran’s characteristic north–south pattern driven by topography and emission density. Comparative analysis with preliminary outputs demonstrates that feature optimization enhances spatial coherence, reduces noise artifacts, and improves the representation of localized hotspots. Statistical evaluation further showed strong agreement between the downscaled ozone estimates and observational data at both station and district scales, demonstrating effective generalization across heterogeneous urban environments. Overall, the findings highlight the potential of combining deep learning with interpretability techniques to refine coarse satellite ozone observations and provide a scalable, high-resolution framework for urban air-quality monitoring and exposure assessment.

Date: Thursday, 09-July-2026

8:30am - 10:00am

WG II/3F: 3D Scene Reconstruction for Modeling & Mapping
Location: 715B

8:30am - 8:45am

Beyond Photorealism: Gaussian Splatting for the Precise Reconstruction of Complex Geometries In Underwater Photogrammetry

Andrea Cimatoribus¹, Davide Antonio Cucci¹, Fabio Menna², Erica Nocerino³, Christoph Strecha¹

¹PIX4D SA, Route de Renens 24 1008 Prilly, Switzerland; ²Department of Chemical, Physical, Mathematical and Natural Sciences, University of Sassari, Sassari, Italy; ³Department of Humanities and Social Sciences, University of Sassari, Sassari, Italy

This study examines PIX4D’s implementation of Gaussian Splatting for reconstructing complex geometries, with a focus on underwater photogrammetry for coral reef mapping. Unlike standard Gaussian Splatting pipelines that emphasize photorealistic rendering, our approach prioritizes high-precision geometric reconstruction, especially for thin structures and heavily occluded regions. We compare the method against conventional multi-view stereo techniques using both real underwater imagery collected in Moorea (French Polynesia) and synthetic datasets generated with the POSER underwater simulation framework.

8:45am - 9:00am

Merchantable Tree Stem Volume Estimation using Mobile Backpack LiDAR

Sangyoon Park¹, Hazem Hanafy¹, Chunxi Zhao¹, Jinyuan Shao², Songlin Fei², Ayman Habib¹

¹Lyles School of Civil and Construction Engineering, Purdue university, United States of America; ²Department of Forestry and Natural Resources, Purdue university, United States of America

Stand-level merchantable tree stem volume estimation in temperate forests is critical for data-driven forest management decision-making. Mobile laser scanning (MLS) has greatly improved data-collection efficiency for forest biometrics; however, automated analysis of massive, structurally complex MLS point clouds remains limited. This study presents an automated framework to estimate stand-level merchantable stem volume from backpack mobile Light Detection and Ranging (LiDAR) data. The framework comprises three stages: (1) point cloud reconstruction using the Integrated-Scan Simultaneous Trajectory Enhancement and Mapping (IS²-TEAM) method; (2) individual tree segmentation via a multistage geometric pipeline; and (3) merchantable stem volume estimation based on skeletonization-derived stem modeling. The proposed approach is evaluated on a forest-scale dataset collected in temperate natural forests in the United States. Results demonstrate operational feasibility at scale, with practical processing times and robust geometric consistency. Validation against destructively measured reference volumes shows that the proposed approach outperforms baseline quantitative structure modeling (QSM) methods, achieving a coefficient of determination (R²) of 0.97, a bias of −0.06 m³, and a root mean square error (RMSE) of 0.21 m³. The proposed framework enables reliable, automated estimation of merchantable stem volume from MLS data and supports deployment from individual-tree to forest scales with minimal manual intervention.

9:00am - 9:15am

TRACE: Instance-Level Open-Vocabulary Inventory Generation for 3D Forensic Evidence Reconstruction

Florian Johannes Eichinger^1,2, Michael Greza^1,2, Benjamin Busam^1,2

¹Technical University of Munich, Germany; ²Munich Center for Machine Learning, Germany

TRACE is a training-free framework for instance-level open-vocabulary inventory generation in 3D forensic evidence reconstruction.

Starting from multiview RGB imagery, prompt-based 2D object masks are extracted using SAM3 and associated across views via geometry-aware and appearance-aware multiview instancing. Based on COLMAP geometry and DINOv2/v3 descriptors, the proposed framework establishes globally consistent same-class object identities across the scene. The resulting global instances are then encoded with SigLIP2 to obtain language-aligned instance descriptors and subsequently lifted into a 3D Gaussian Splat representation by assigning instance-level semantics to geometrically supported Gaussian subsets. This yields an enriched 3D scene representation that jointly preserves spatial structure, object-level identity, and language-accessible semantics, thereby enabling instance-aware open-vocabulary querying in 3D.

9:15am - 9:30am

Surface Water 3-D Mapping With Point Cloud Data of Single Return Airborne LiDAR

Cihan Altuntas

Konya Technical University, Turkiye

The purpose of this study is to automatically classify water and land areas with LiDAR point clouds. After determining the average water level, the water and land surfaces were classified. Previous studies have focused on supervised classification based on land sampling or deep learning techniques using photographs. However, these classification techniques are expensive and require long calculation times. In this study, a method is proposed for the automatic classification of water and land areas without land surveys using the coordinate and reflection values of LiDAR point clouds. The bounding box method was used to detect water surface levels. The correlations between the min-box level, mean box height, and mean box reflection values of the LiDAR point data were used to determine the water surface level. The results show that the method is suitable for the fast classification of water surfaces from LiDAR point clouds. Thus, shoreline changes in large areas can be detected automatically without the need for land surveying. The proposed bounding box classification method can be applied independently of LiDAR point cloud density. The extended version of this method can also be used to detect vehicles and objects on a water surface.

9:30am - 9:45am

Enhancing underground environment rendering with lightweight 3D gaussian splatting

Roberto de Lima-Hernandez, Zhiya Yang, Maarten Vergauwen

KU Leuven, Belgium

Underground environments such as sewer networks are critical infrastructure whose condition directly affects public health, environmental protection, and maintenance costs. Conventional inspection workflows largely rely on monocular CCTV systems and manual video review, providing limited 3D understanding and often missing subtle or spatially complex defects. At the same time, sewer environments are characterised by challenging imaging conditions, including low illumination, specular surfaces, water films and occlusions, which further complicate reliable assessment.

In this extended abstract, we present a real-time inspection concept that combines (i) stereo camera-based SLAM for geometric mapping and pose estimation, (ii) Vision Transformer (ViT) based anomaly detection trained on the public SewerML dataset, and (iii) lightweight Gaussian Splatting modules that create local high-resolution 3D reconstructions only in the vicinity of detected defects. The system is targeted at embedded hardware, specifically an NVIDIA Jetson Nano, and is designed for deployment and evaluation in real sewer environments. The overall goal is to provide inspectors and asset managers with spatially anchored 3D visualisations of anomalies that can be integrated into digital-twin workflows for decision support and long-term monitoring.

9:45am - 10:00am

Robust Cross-Modal Matching between LiDAR Point Clouds and Multi-Camera Images in Tunnel Environments via Surface Parameterization

Ying Jiang¹, Feng Liu², Han Hu^1,3, Yulin Ding^1,3, Chong Wang^3,4, Ping Wen^3,4, Qing Zhu^1,3

¹Faculty of Geosciences and Engineering, Southwest Jiaotong University; ²CRSC Communication & Information Group Co., Ltd.; ³Yunnan Engineering Research Center of 3D Real Scene; ⁴Kunming Engineering Corporation Limited

This paper proposes a robust cross-modal matching framework for tunnel inspection, specifically designed to address the unique challenges posed by low-texture environments often encountered in tunnel linings. Traditional image-based matching techniques struggle in these environments due to the lack of distinctive surface features and limited texture variation. To overcome these challenges, the proposed method leverages the global prior knowledge of tunnel geometry. By jointly projecting LiDAR point clouds and multi-camera images onto a shared parameterized cylindrical surface, the method constructs a unified geometric space that facilitates accurate 3D–2D correspondences. This dual-projection strategy significantly improves the alignment of structural features such as segment joints, line grooves, and equipment brackets, which are critical for defect detection in tunnel inspection. The enhanced matching ability allows for more reliable multi-sensor data fusion, thereby supporting the automated analysis of tunnel defects. This framework lays a solid foundation for intelligent tunnel inspection systems, offering a powerful solution for real-time monitoring and analysis of tunnel infrastructure.

1:30pm - 3:00pm

WG IV/10: Applied Spatial Science for Public Health
Location: 715B

1:30pm - 1:45pm

Benchmarking and assessment of image-based methods for particulate matter estimation: The AQpictures project

Afshin Moazzam¹, Daniele Oxoli¹, Maria Antonia Brovelli¹, Songnian Li², Francesco Pirotti³, Shishuo Xu⁴

¹Politecnico di Milano, Italy; ²Toronto Metropolitan University; ³University of Padova; ⁴Beijing University of Civil Engineering and Architecture

The AQpictures project, conducted under the ISPRS Scientific Initiatives 2025, addresses the emerging field of image-based estimation of fine particulate matter (PM2.5) concentrations in urban areas. PM2.5 represents a major public health concern, yet existing ground-based monitoring networks offer limited spatial coverage and satellite-derived products struggle to capture surface-level variability. Recent studies have demonstrated that visual attributes in outdoor images, such as sky colour, haze, and visibility, can provide useful indicators of PM2.5 concentrations. Building upon this premise, AQpictures aims to develop an open, reproducible framework for benchmarking and validating image-based air quality estimation methods.

The project first conducts a comprehensive literature review to classify existing approaches into four methodological categories: physics-based, machine learning, deep learning, and hybrid models. Based on this synthesis, a benchmark experiment is implemented for the city of Milan, combining a ten-month dataset of webcam images with co-located PM2.5 ground measurements. The workflow involves image preprocessing, feature extraction, and model evaluation using standard statistical indicators (R², RMSE, MAE). Preliminary tests include physics-based visibility models, feature-based regressors, and convolutional deep learning architectures.

All codes, datasets, and documentation are consolidated in an open-access GitHub repository to ensure transparency, reproducibility, and adaptability of methods across different environmental contexts. Early results confirm the feasibility of PM2.5 estimation from RGB imagery, though further investigations on multi-city datasets are planned to evaluate model transferability and robustness under varying urban and climatic conditions.

1:45pm - 2:00pm

Interoperable Federated Access to Multi-Vendor Wearables for Postpartum Wellbeing Support: A Standards-Based Architecture for MAMAI

Faraneh Falah, Sepehr Honarparvar, Mahnoush Mohammadi Jahromi, Steve Liang, Sara Saeedi

University of Calgary

This paper presents MAMAI (Maternal Assistance and Monitoring through Artificial Intelligence), a standards-based framework designed to enable interoperable postpartum well-being monitoring using multi-vendor wearable devices. The proposed system addresses a key limitation in digital maternal health: the fragmentation of wearable ecosystems and the lack of integration with clinical infrastructures. MAMAI introduces a federated, edge–cloud architecture that allows wearable data to be processed locally while transmitting only summarized to the cloud.

A core contribution of this work is the integration of two complementary interoperability standards: the OGC SensorThings API for structuring IoT-based sensor observations, and HL7 FHIR for representing well-being indicators in clinically compatible formats. Through this dual-standard approach, heterogeneous wearable data—such as sleep patterns, physical activity, and heart-rate variability—are harmonized into standardized, platform-independent representations.

The framework further introduces a composite well-being score derived from normalized physiological indicators, enabling continuous and interpretable assessment of maternal health. A prototype implementation demonstrates the feasibility of the architecture, supporting end-to-end data ingestion, transformation, interoperability mapping, and visualization. Experimental results show efficient system performance with low end-to-end latency.

Overall, MAMAI provides a scalable and interoperable solution for integrating consumer wearable data into healthcare ecosystems, offering a foundation for next-generation maternal digital health systems and continuous postpartum monitoring.

2:00pm - 2:15pm

Seeing vertical greenery: Global differences in residents’ green exposure and inequality

Xiaozhen Ren¹, Xuefeng Guan¹, Liqun Sun², Yifan Teng¹, Qingyang Xu¹, Chang Liu¹, Zhangyan Xu¹, Xu Li¹

¹State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China; ²Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China

Achieving the United Nations Sustainable Development Goal (SDG) 11.7.1—“providing universal access to safe, inclusive, accessible, and green public spaces by 2030”—underscores the critical role of urban green space in advancing global sustainability.Although extensive research has examined urban greenery from a traditional planar perspective, green spaces inherently possess vertical structure. Currently, systematic quantitative assessments of urban vertical greenery, residents’ actual exposure to vertical green space, and the associated inequalities remain limited. To address these gaps, this study integrates global population data with vegetation height information to construct an exposure-based analytical framework.We quantify spatial patterns of vertical greenery, residents’ green exposure, and exposure inequality across global urban areas, and further examine the drivers of inequality. Our findings reveal pronounced spatial disparities in urban greenery worldwide. On average, cities in the Global North exhibit approximately three times greater vertical greenery and nearly four times higher green exposure than cities in the Global South. African urban areas possess only one-sixth of the average vertical greenery and one-seventh of the exposure level observed in North America, while displaying roughly twice the inequality in green exposure, indicating much more uneven access to green resources. We also find that cities with higher average vertical greenery tend to experience lower exposure inequality, suggesting that increasing overall greenery can help promote more equitable access. These results provide new theoretical insights and policy-relevant evidence for advancing sustainable and equitable urban green development, supporting global progress toward sustainable development goals.

2:15pm - 2:30pm

Modeling Dynamic Walkability to Support Time-Based Route Planning for Older Adults

Febrian Fitryanik Susanta^1,2, Pei Fen Kuo¹, I Gede Brawiswa Putra¹

¹Department of Geomatics, National Cheng Kung University, No. 1 Dasyue Road, East District, Tainan City 701, Taiwan; ²Department of Geodetic Engineering, Universitas Gadjah Mada, Jl. Grafika No. 2, Yogyakarta 55281, Indonesia

Walkability assessments for elderly pedestrians are often based on static representations of the built environment, overlooking temporal variations that influence walking conditions throughout the day. This study develops a network-based dynamic walkability framework that integrates static infrastructural characteristics with time-dependent environmental factors to capture spatiotemporal variability in pedestrian suitability. The approach combines sidewalk and arcade-based pedestrian networks with dynamic variables, including traffic, air quality index (AQI), temperature, humidity, shade, and lighting, evaluated at two time periods (12:00 and 17:00) across weekdays and weekends in three urban contexts in Tainan, Taiwan: a hospital area, a university campus, and a residential neighborhood. Results indicate clear spatial differences, with hospital and campus areas showing higher baseline walkability than residential areas. Dynamic analysis reveals temporal variation, with improvements ranging from approximately 3–8% in institutional environments to over 10% in residential areas. Segment-level results further show that temporal factors can alter pedestrian suitability, particularly in areas with limited infrastructure. Route-based validation demonstrates that the model generates alternative paths that prioritize safety and environmental comfort over the shortest distance. Compared to Google Maps routes, the proposed approach achieves higher average walkability, with improvements ranging from approximately 5% to over 15%, particularly in residential areas. These findings highlight the limitations of static and shortest-path approaches and emphasize the importance of incorporating temporal dynamics. The proposed framework supports time-sensitive routing and age-friendly urban planning strategies.

2:30pm - 2:45pm

An Environment-Aware Indoor-Outdoor Integrated Digital Twin for Healthy Mobility

Yan Li, Chenming Ye, Wenxuan Shi, Wenqing Zhang, Yuyang Zhang, Teng Hu, Zhizhong Kang

China University of Geosciences (Beijing), China, People's Republic of

Existing building digital twins treat indoor environments as static geometric containers, ignoring the dynamic coupling between ventilation structure states and indoor environmental quality. Furthermore, managing indoor and outdoor spaces as separate data silos prevents the continuous assessment of occupant exposure across building boundaries. This paper proposes an environment-aware, indoor-outdoor integrated digital twin framework coupling geometric entity states with physical environmental fields for healthy mobility assessment. The framework utilizes a three-layer architecture. First, the Geometric-Semantic Layer provides a seamless LOD4 model with topologically stitched spaces, modeling ventilation facilities as first-class entities with mutable state attributes (Full Closed, Half Open, Full Open). Second, the Physical Field Layer maps mobile sensing data (PM2.5, CO2) onto semantic entities using a semantic-constrained method, treating walls and closed windows as aggregation barriers. Finally, the Behavioral Response Layer combines entity-level pollution values with pedestrian counts to compute a cumulative Crowd Exposure Index (CEI). Implemented on a Cesium platform, the framework was validated through a week-long university building experiment. Results show indoor PM2.5 in a fully enclosed study room averaged 61.2 μg/m³—1.6 times the outdoor level and 4.1 times the WHO guideline. This resulted in a CEI 12 times higher than in outdoor transit areas. Semantic correlation confirms the "Full Closed" window state primarily drives pollutant accumulation. This validates the framework's core geometry-physics coupling, demonstrating its potential to guide intelligent ventilation interventions and healthy building management.

2:45pm - 3:00pm

Integrating ulti-Source Remote Sensing and GIS for Urban Air Quality Mapping in Emerging City: Insights from Nashik City, India

Chetankumar Patel, Shishir Dadhich

SVNIT,SURAT

Rapid industrialization and unplanned urbanization have increased air pollution levels across Indian cities, posing serious environmental and health challenges. This research presents a geospatial assessment of air pollutant behaviour across Nashik city by integrating multi-source remote sensing datasets and real observation datasets from Sentinel-5P, NASA POWER, and CPCB ground observations within a GIS-based analytical framework. Using ward-level mapping and spatial overlays, the study examines the distribution of key pollutants—PM2.5, PM10, NO2, SO2, and CO—and their relationship with environmental and anthropogenic parameters, including land use, road networks, wind direction, temperature, and vegetation density. The results consistently reveal high concentrations of PM2.5, ranging from a minimum of 52.4 µg/m³ to a maximum of 73 µg/m³, and PM10, a minimum of 87.3 µg/m³ and a maximum of 121.5 µg/m³, particularly along high-traffic corridors and industrial zones, which exceed the WHO standards. Correlations with meteorological and vegetative factors further highlight the influence of urban form and climatic conditions on pollutant dispersion. This integrated approach demonstrates how multi-source remote sensing and GIS tools can be effectively employed to identify emission hotspots, support evidence-based policy formulation, and strengthen urban environmental management strategies for sustainable development.

3:00pm - 3:15pm

Long-Term Monitoring of NO₂ Pollution in the Mining and Industrial Region of Korba in Chhattisgarh Using Sentinel-5P and NDPI

Aman Srivastava, Aditya Kumar Thakur, Rahul Dev Garg, Pradeep Kumar Garg

Indian Institute of Technology Roorkee, India

Air pollution is a critical environmental challenge, with nitrogen dioxide (NO₂) from vehicles and industries posing serious health and atmospheric risks. Traditional monitoring is limited, making satellite-based methods essential for large-scale assessment. Korba, Chhattisgarh is an industrial hub of coal mining and thermal power plants is a major pollution contributor. This study investigates the spatiotemporal dynamics, statistical behavior, and long-term trends of NO₂ concentrations over the Korba region from 2019 to 2024, utilizing Sentinel-5P TROPOMI-derived NO₂ column density and the Normalized Difference Pollution Index (NDPI). Year-wise NDPI patterns revealed a consistent pollution hotspot in the central-southern region, with the annual mean NDPI gradually increasing from 0.175 in 2019 to 0.191 in 2023. The monthly NDPI peaked in December-2024 at 0.525, indicating severe winter pollution. Statistical analysis showed moderate variability and a near-symmetric NDPI distribution with occasional spikes near industrial zones. Trend analysis identified a marginal but steady increase in pollution. Autocorrelation analysis revealed strong short-term persistence (lag-1 = 0.594), while spectral analysis identified a dominant annual frequency (0.083 cycles/month) with a peak power of 0.107, confirming the presence of strong seasonal variation and short-term persistence in NO₂ concentration. These results underscore the cyclic yet escalating nature of NO₂ pollution, with notable winter intensification. The findings emphasize the need for targeted emission control strategies and policy-level interventions to manage regional air quality. Future work should integrate ground-based validation and explore meteorological influences to improve predictive accuracy and guide sustainable environmental management.

3:30pm - 5:15pm

WG II/4C: AI/ML for Geospatial Data
Location: 715B

3:30pm - 3:45pm

DeepChoice: Learning View Weighting for Image-Guided 3D Semantic Segmentation

Antoine Carreaud^1,2, Digre Frinde¹, Shanci Li¹, Jan Skaloud², Adrien Gressin¹

¹University of Applied Sciences Western Switzerland (HES-SO / HEIG-VD); ²ESO lab, EPFL, Switzerland

Multi-view image-to-point label transfer is an effective strategy for 3D semantic segmentation, but its performance largely depends on how predictions from multiple image observations are fused for each 3D point. Most existing pipelines rely on hard voting or handcrafted weighting rules, which do not explicitly learn the reliability of each view under varying geometric and image-quality conditions. In this paper, we introduce DeepChoice, a lightweight view-weighting module for image-guided 3D semantic segmentation. For each visible observation of a 3D point, DeepChoice exploits a compact set of visibility cues, including incidence angle, range, contrast, sharpness, signal-to-noise ratio, and saturation, to predict normalized per-view weights used to aggregate 2D semantic class probabilities into final 3D point-wise predictions. The method is sensor-agnostic, requires no meshing, and can be integrated as a replacement for standard multi-view fusion rules. Experiments on the full GridNet-HD benchmark show that DeepChoice improves over hard voting by 3.85 mIoU points and over mean-probability fusion by 1.26 points, while reducing the gap with the AnyView oracle upper bound. The largest gains are observed on thin and difficult classes such as conductors, pylons, and insulators. Furthermore, a complementary evaluation on the Images PointClouds Cultural Heritage}dataset shows that the proposed weighting strategy remains beneficial under a very different acquisition context and scene structure, yielding a 1.55 mIoU point improvement over hard voting. These results show that learning how to weight views is a simple yet effective way to strengthen image-guided 3D semantic segmentation pipelines. Code is publicly available at: https://huggingface.co/heig-vd-geo/DeepChoice.

3:45pm - 4:00pm

Semantic Segmentation of Textured Non-manifold 3D Meshes using Transformers

Mohammadreza Heidarianbaei, Max Mehltretter, Franz Rottensteiner

Leibniz University Hannover, Germany

Textured 3D meshes jointly encode geometry, topology, and appearance, yet their irregular structure poses significant challenges for deep-learning-based semantic segmentation.

While a few recent methods operate directly on meshes without imposing geometric constraints, they typically overlook the rich textural information also provided by such meshes. We introduce a texture-aware transformer that learns directly from raw pixels associated with each mesh face, coupled with a new hierarchical learning scheme for multi-scale feature aggregation.

A texture branch summarizes all face-level pixels into a learnable token, which is fused with geometrical descriptors and processed by a stack of Two-Stage Transformer Blocks (TSTB), which allow for both a local and a global information flow.

We evaluate our model on the Semantic Urban Meshes benchmark and a newly curated cultural-heritage dataset comprising textured roof tiles with triangle-level annotations with damage types.

Our method achieves 81.9\% mF1 and 94.3\% OA on SUM, and 49.7\% mF1 and 72.8\% OA on new dataset, substantially outperforming existing approaches.

4:00pm - 4:15pm

Pothole Classification using Point Cloud Data: a Comparison between Machine Learning and Deep Learning

Kristin Eggen, Hongchao Fan

Norwegian University of Science and Technology, Norway

Automatic pothole detection is important for improving road maintenance and transportation safety. While image-based pothole detection often struggles under poor lighting and weather conditions, point cloud data provides a robust alternative by capturing detailed surface geometry. Machine learning has demonstrated strong performance in point cloud classification. While traditional machine learning is simpler and relies on handcrafted features, deep learning models are more powerful, as they learn complex, high-dimensional patterns directly from the input data. While most existing work relies on deep learning models, which are time-consuming to train and require extensive labelled datasets, potholes can be well described by geometric features, making pothole detection well-suited for feature engineering. This paper compares traditional machine learning and deep learning approaches for pothole classification using point cloud data, to evaluate whether the added complexity and data demands of deep learning models are justified, or if traditional machine learning techniques are sufficient for accurate classification. A dataset with labelled pothole instances is created to train both models. The machine learning approach uses manually engineered geometric features as input to an ensemble classifier, while the deep learning model is trained on sampled data. Experimental results show that the machine learning approach outperformed the deep learning model. These results suggest that for this particular task, where informative domain-specific features can be manually engineered, the machine learning approach offers a more practical and efficient solution for real-world deployment, where labelled data may be limited.

4:15pm - 4:30pm

From Canopy to Crown: High-Fidelity Tree Facade Synthesis from Nadir LiDAR data

Raghav Sharma¹, Frank Zhang¹, Jane Liu², Baoxin Hu³

¹University of Fraser Valley; ²University of Toronto; ³York University

Synthesizing realistic fac¸ade views of individual trees from nadir-view remote sensing data would transform large-scale forest

analysis, yet remains unsolved due to data scarcity and task ambiguity. We present the first conditional diffusion model to generate

structurally plausible fac¸ade views of individual tree crowns from single nadir-view LiDAR rasters, leveraging the FOR-species20K

benchmark dataset. Our approach integrates nadir projections with tree species and height within a U-Net-based denoising diffusion

framework. Experiments demonstrate that nadir imagery alone is insufficient, but conditioning on species and height enables

synthesis of visually realistic, species-specific fac¸ade views. The fully conditioned model achieves substantial gains in perceptual

(LPIPS: 0.184) and structural (SSIM: 0.576) similarity, outperforming nadir-only baselines by more than twofold. Our results

establish that ancillary attributes critically constrain the solution space, enabling diffusion models to infer plausible structures

from ambiguous nadir input. This work demonstrates a scalable path to enriching nadir-based forest inventories with synthesized

structural detail, reducing the need for resource-intensive ground surveys.

4:30pm - 4:45pm

Evaluation of Metric Monocular Depth Estimation Models Under Adverse Weather Conditions in Driving Scenarios

Nour Khalefa, Roberto Souza, Naser Elsheimy

University of Calgary, Canada

Metric monocular depth estimation has become increasingly important and is often used as a redundancy mechanism in autonom

ous driving, where accurate scene understanding is essential for safe decision-making. In this work, we evaluate three recently

proposed models that represent the state-of-the-art (Depth Anything, PackNet-SfM, and UnidDepth) using zero-shot testing on the

DrivingStereo dataset across diverse weather conditions, and benchmark their performance. Our analysis considers not only metric

depth accuracy metrcis but also each model’s ability to generalize under challenging environmental variations. While UniDepth

achieves notable improvements over Depth Anything and PackNet-SfM, our results show that substantial progress is still needed for

robust real-world deployment. To further assess its practical suitability for autonomous driving applications, we conduct a detailed

examination of UniDepth’s strengths, limitations, and failure modes.

4:45pm - 5:00pm

Out-of-Distribution Detection for Real-World Honey Bee Monitoring Using Simulated Permanent Laser Scanning

William Albert^1,2, Ronald Tabernig^1,2, Jannik S. Meyer¹, Bernhard Höfle^1,2

¹3DGeo Research Group, Institute of Geography, Heidelberg University; ²Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University

We present the first Open-Set Recognition (OSR) workflow for environmental monitoring for Permanent Laser Scanning (PLS) setups, using a Deep Neural Network (DNN) solely trained on simulated data. Such monitoring systems were previously only trained with real-world data and under the closed-set assumption, because they are commonly designed to observe a specific and predefined phenomenon (e.g., beach erosion, rockfall activity, vegetation change, animal behavior). The use of real-world data requires manual labeling, which is tedious given the great amount of point clouds. For this reason, we use Virtual Laser Scanning of Dynamic Scenes (VLS-4D) in a PLS setup to investigate how knowledge from synthetic data can be applied to real-world PLS monitoring systems in open-set settings. We introduce a novel framework that enables Open-Set Recognition (OSR) for animal monitoring (e.g. honey bees) using PLS data. The DNN is fine-tuned exclusively on a simulated LiDAR point cloud time series of flying honey bees, and integrates OSR to handle unknown classes during real-world deployment (e.g., butterflies, leaves, wren, and hare). By leveraging deviations in feature embeddings of the DNN, our method reliably distinguishes the known honey bee class from previously unseen classes, supporting robust monitoring under persistent distribution shifts. This approach reduces the dependence on extensive manual annotation of real-world point clouds, while maintaining reliable classification performance. It also highlights the potential of synthetic training data and OSR for environmental monitoring with PLS systems.

Date: Friday, 10-July-2026

8:30am - 10:00am

WG II/3G: 3D Scene Reconstruction for Modeling & Mapping
Location: 715B

8:30am - 8:45am

ZeD-MAP: Bundle Adjustment Guided Zero-Shot Depth Maps for Real-Time Aerial Imaging

Selim Ahmet Iz¹, Francesco Nex², Norman Kerle², Henry Meissner¹, Ralf Berger¹

¹German Aerospace Center, Germany; ²University of Twente, The Netherlands

Real-time depth reconstruction from ultra-high-resolution UAV imagery is essential for time-critical geospatial tasks such as disaster response, yet remains challenging due to wide-baseline parallax, large image sizes, low-texture or specular surfaces, occlusions, and strict computational constraints. Recent zero-shot diffusion models offer fast per-image dense predictions without task-specific retraining, and require fewer labelled datasets than transformer-based predictors while avoiding the rigid capture geometry requirement of classical multi-view stereo. However, their probabilistic inference prevents reliable metric accuracy and temporal consistency across sequential frames and overlapping tiles. We present ZeD-MAP, a cluster-level framework that converts a test-time diffusion depth model into a metrically consistent, SLAM-like mapping pipeline by integrating incremental cluster-based bundle adjustment (BA). Streamed UAV frames are grouped into overlapping clusters; periodic BA produces metrically consistent poses and sparse 3D tie-points, which are reprojected into selected frames and used as metric guidance for diffusion-based depth estimation. Validation on ground-marker flights captured at approximately 50 m altitude (GSD ≈ 0.85 cm/px, ~2,650 m² ground coverage per frame) with the DLR Modular Aerial Camera System (MACS) shows that our method achieves sub-meter accuracy, with approximately 0.87 m error in the horizontal (XY) plane and 0.12 m in the vertical (Z) direction, while maintaining per-image runtimes between 1.47 and 4.91 seconds. Results are subject to minor noise from manual point-cloud annotation. These findings show that BA-based metric guidance provides consistency comparable to classical photogrammetric methods while significantly accelerating processing, enabling real-time 3D map generation.

8:45am - 9:00am

Bundle-Adjusted Initialization for efficient Earth Observation Gaussian Splatting

Jiyong Kim¹, Shuang Song^1,3, Rongjun Qin^1,2

¹Department of Civil, Environmental and Geodetic Engineering, The Ohio State University, USA; ²Department of Electrical and Computer Engineering, The Ohio State University, USA; ³Translational Data Analytics Institute, The Ohio State University, Columbus, USA

Satellite-based 3D reconstruction has gained prominence with the advancement of Earth Observation techniques. Recent work on Earth Observation Gaussian Splatting (EOGS) demonstrated the potential of adapting 3D Gaussian Splatting to satellite imagery, enabling rapid Digital Surface Model (DSM) generation from multiple images using Rational Polynomial Coefficients (RPCs) as camera models. However, EOGS suffers from critical inefficiencies: it randomly initializes a large number of Gaussians in volumetric space and relies on opacity-based pruning, resulting in unstable memory footprints and premature loss of fine details—particularly problematic for low-resolution satellite data.

This work presents an improved Gaussian Splatting framework for satellite imagery that addresses these limitations through two key contributions. First, we introduce bundle-adjusted initialization, which leverages geometrically precise points from the bundle adjustment process as initialization seeds rather than random placement. This approach ensures Gaussians are anchored to accurate geometric positions from the outset, significantly improving convergence stability. Second, we propose densification-included optimization, which strategically adds Gaussians in regions requiring detailed reconstruction while maintaining computational efficiency. This selective densification preserves fine-scale features without the memory overhead of EOGS's initial over-allocation strategy.

Our method achieves faster processing times and maintains more consistent memory usage while producing higher-quality DSMs, particularly in challenging low-resolution scenarios. By combining geometric priors from bundle adjustment with adaptive densification, we enable more practical and efficient satellite-based 3D reconstruction suitable for large-scale Earth observation applications.

9:00am - 9:15am

Evaluating Classical and Deep Keypoint Detectors For SfM Reconstruction in Arctic UAV Imagery

Nicholas Sansoterra¹, Charles Toth¹, Alper Yilmaz¹, Mendoza Lenzano², John Anderson³, William Shuart³

¹The Ohio State University, United States of America; ²Resp. Lab. Geomatica Andino (LAGEAN); ³USACE ERDC GRL Corbin field Station, USA

This contribution presents a comparative evaluation of classical and deep learning–based keypoint detectors for Structure-from-Motion (SfM) reconstruction in challenging Arctic UAV imagery. Snow-covered environments pose difficulties for standard feature matching due to low texture, repetitive patterns, and specular surfaces. While deep keypoint pipelines have shown strong performance on indoor and urban benchmarks, their effectiveness in winter aerial domains remains largely unexplored.

Using multi-view UAV datasets collected across several Alaskan sites, we benchmark three feature-extraction front-ends within a uniform pycolmap-based SfM pipeline: (i) classical SIFT with nearest-neighbor matching; (ii) SuperPoint, a self-supervised convolutional detector–descriptor; and (iii) DISK, a reinforcement-learning–based feature extractor. A simple hybrid approach combining SuperPoint and DISK matches is also tested. All methods share identical geometric verification and bundle-adjustment settings to ensure consistency.

Results show that SIFT remains highly robust on moderately textured Arctic scenes, registering all images and producing the most complete point clouds. SuperPoint and DISK achieve similar reprojection accuracy but struggle with image registration and keypoint coverage on some sequences. Conversely, on extremely low-texture scenes where SIFT fails almost entirely, both deep methods still enable partial reconstructions. Persistent failure cases for all techniques include dense canopy and homogeneous snow.

The study highlights a domain gap between existing deep keypoint models and Arctic aerial imagery, suggesting that domain-specific training and improved spatial keypoint diversity could substantially enhance deep SfM performance in polar regions.

9:15am - 9:30am

Occlusion-Robust SfM in Construction Sites via Geometry-Guided Foreground Segmentation

Changjiang Yin¹, Shaoming Zhang¹, Qin Ye^1,2, Junqi Luo¹

¹College of Surveying and Geo-Informatics, Tongji University, 200092, Shanghai, China; ²Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources, 518000, Shenzhen, China

Accurate 3D reconstruction is a key enabler for construction progress monitoring and digital-twin maintenance. However, in tower-crane imagery, persistent dynamic occluders such as hooks and slings violate the static-scene assumption of conventional Structure-from-Motion (SfM), leading to feature mismatches and degraded reconstruction consistency. In this paper, we present a geometry-guided occlusion-handling pipeline for crane-mounted construction-site SfM. Our approach leverages geometric cues from reprojection errors and depth inconsistencies to identify outlier observations, clusters them into spatially coherent prompts, and uses these to guide a foundation segmentation model (SAM2). The resulting per-frame masks are integrated into mask-constrained SfM optimization, ensuring that only static background contributes to reconstruction. Experiments on three real-world crane-mounted sequences (30m, 45m, and 120m) show consistent reductions in mean reprojection error relative to the unmasked baseline. In the most challenging case, the error decreases from 0.962 to 0.872 pixels (9.4%). Compared with a fixed rectangular masking strategy, the proposed masks yield similar reprojection errors while better preserving valid observations and sparse-point completeness. These results indicate that the proposed framework provides a practical geometry-guided strategy for improving internal reconstruction consistency in crane-mounted construction environments.

9:30am - 9:45am

Geometry-aided Video Panoptic Segmentation

Tuan Nguyen, Max Mehltretter, Franz Rottensteiner

Institute of Photogrammetry and Geoinformation, Leibniz Hannover University, Germany

Video panoptic segmentation (VPS) unifies panoptic segmentation and object tracking by assigning each pixel a semantic class label, or for thing classes, an instance identifier that is consistent across frames. Addressing this task, we propose a novel online VPS method for processing stereoscopic image sequences, which is based on depth-aware kernel-based panoptic segmentation. Specifically, we introduce a geometrical constraint based on predicted bounding boxes into the segmentation of thing instances to overcome the fundamental limitation of kernel-based panoptic segmentation that only appearance information is considered in this step; this regularly leads to panoptic segmentation results in which distinct instances are erroneously merged into one mask. To link detected instances across frames, we propose to extend the commonly employed appearance-based association with a motion-related constraint based on optical flow; this resolves ambiguities in case of instances of similar appearance and, thus, reduces the number of incorrect associations. We experimentally evaluate our method on the publicly available Cityscapes-VPS dataset and compare our results to those of several related methods from the literature. The results demonstrate that our method improves the panoptic quality for a single frame and enhances the instance association across frames, leading to an overall improvement of 3.5% in Video Panoptic Quality on thing classes compared to the employed baseline.

9:45am - 10:00am

Quatifyng altimetric and volumetric changes of the Belvedere glacier (2009–2023) using Pleiades and Pleiades neo data

Francesco Ioli¹, Luca Cerina², Alberto Cina³, Livio Pinto²

¹IRPI - Italian National Research Council, Turin, Italy; ²DICA - Politecnico di Milano, Italy; ³DIATI - Politecnico di Torino, Italy

This study addresses the morphological evolution of the Belvedere Glacier (Monte Rosa, Macugnaga – Italy) over the period 2009–2023, using a photogrammetric methodology based on Pleiades (2017) and Pleiades Neo (2023) Very-High Resolution (VHR) satellite imagery, integrated with historical aerial data from 2009. The main objective was to quantify altimetric and volumetric variations of the glacier, assess the intensity of ice mass loss, and analyze the geomorphological effects of the flood event that occurred on August 27, 2023, which generated a major debris flow.

Raster differencing between Digital Elevation Models (DEMs) revealed a significant lowering of the glacier surface. Between 2009 and 2017, the glacier lost approximately 19.3 × 10⁶ m³ of ice (about 2.4 × 10⁶ m³/year), while in the following period (2017–2023) the loss reached 16.9 × 10⁶ m³, with an increased average annual rate of 2.8 × 10⁶ m³/year. These values confirm an acceleration in the ablation process, consistent with other studies (De Gaetani 2021; Ioli 2023) and with the general retreat trend observed in Alpine glaciers due to climate warming.

1:30pm - 3:00pm

SpS4B: Remote Sensing of Atmospheric Components for Climate Change and Air Quality: Bridging ISPRS and AERSS
Location: 715B

1:30pm - 1:45pm

PhysNorm-Net: A physics-guided adapted normalization network for reconstructing gapless, hourly tropospheric NO2 VCDs over Asia (2019–2024)

Hongrui Gao, Kai Qin

School of Environment and Spatial Informatics, China University of Mining and Technology, Xuzhou 221116, China

Tropospheric nitrogen dioxide (NO2) is a crucial trace gas for air quality assessment, yet satellite observations often suffer from spatial gaps (e.g., cloud cover) and temporal limitations. While the geostationary satellite GEMS provides hourly data over Asia, its short historical record and missing data restrict long-term studies. Therefore, a physics-guided adapted normalization network (PhysNorm-Net) is designed to reconstruct a gapless, hourly, and high-resolution (0.05°) tropospheric NO2 dataset over Asia from 2019 to 2024.

The model features an asymmetric U-Net architecture. It handles irregular data gaps using Partial Convolution with a dynamic mask and extracts spatiotemporal representations from meteorological and chemical priors. A novel Physics-Aware Normalization (PhysNorm) module bridges the modality gap by dynamically modulating satellite feature maps using physical backgrounds, ensuring adherence to atmospheric diffusion laws.

Extensive evaluations show that PhysNorm-Net achieves high prediction accuracy (R2 = 0.886). It robustly recovers spatial morphologies and pollution plumes even under extreme missing data scenarios. The generated 2019-2024 dataset accurately captures complex diurnal variations and localized hotspots, providing valuable insights into human activities and pollution policies in Asia.

1:45pm - 2:00pm

Physics-Informed Neural Networks for Efficient Spatiotemporal Inversion of NOx Emissions from TROPOMI

Qin He¹, Kai Qin¹, Hongrui Gao¹, Man Sing Wong², Jason Blake Cohen¹

¹China University of Mining and Technology, Xuzhou, 221116, China; ²The Hong Kong Polytechnic University, Kowloon, 999077, Hong Kong

Accurate estimation of nitrogen oxide (NOx) emissions is essential for understanding their role in atmospheric chemistry and managing air pollution. This study presents a novel approach using Physics-Informed Neural Networks (PINNs) to invert NOx emissions from TROPOspheric Monitoring Instrument (TROPOMI) satellite data. By coupling the physical laws of atmospheric processes, effectively bridging traditional data assimilation techniques with the computational efficiency of deep learning. Unlike purely data-driven models, it directly integrates physical constraints from atmospheric mass continuity equation into the model training process, eliminating the need for inputs or outputs from computationally intensive chemical transport models. Application to the Yangtze River Delta region of China (2018–2023) revealed detailed spatiotemporal NOx emission trends, including the impacts of the COVID-19 pandemic and subsequent recovery. Uncertainty quantification through Monte Carlo dropout provides robust error estimates. This physics-informed approach demonstrates strong potential for efficient NOx emission inversion and offers a versatile foundation for broader quantitative remote sensing applications.

2:00pm - 2:15pm

Fast Cloud Property Retrieval from TROPOMI O₂-A Band Observations Using a DISAMAR-Based Neural Network Framework

Tao Xie¹, Xiaoyun Zhang^2,3, Wenmei Li¹, Yixiang Chen¹, Ping Wang², Piet Stammes², Gijsbert Tilstra², Olaf Tuinder², Maarten Sneep², Feng Lu⁴

¹School of Internet of Things, Nanjing University of Posts and Telecommunications, China; ²R&D Satellite Observations (RDSW), Royal Netherlands Meteorological Institute (KNMI), NL; ³Nanjing University of Information Science and Technology (NUIST), China; ⁴Key Laboratory of Radiometric Calibration and Validation for Environmental Satellites, National Satellite Meteorological Center (National Center for Space Weather), Innovation Center for Feng Yun Meteorological Satellite (FYSIC), China Meteorological Administrations, Beijing 100049, China

With improvements in the spatial resolution of satellite spectrometers such as TROPOMI, Sentinel-4 and Sentinel-5, more homogeneous cloudy scenes can be resolved at the pixel scale. Therefore, it is worthwhile to use a scattering cloud model in cloud retrieval algorithms. DISAMAR (Determining Instrument Specifications and Analysing Methods for Atmospheric Retrieval) is a computer model developed to simulate the retrieval of atmospheric trace gases, aerosols, clouds, and land-surface properties from passive remote-sensing observations in the 270–2400 nm wavelength range. As a line-by-line radiative transfer model, DISAMAR provides accurate simulations but is computationally expensive. Machine learning techniques can improve the speed of cloud retrieval, because a neural network trained with detailed radiative transfer calculations for scattering clouds can replace the most time-consuming part of the retrieval algorithm.

In this study, we plan to build a cloud retrieval algorithm based on DISAMAR and accelerate it using neural network methods. The algorithm uses TROPOMI observations in the O₂-A band and supports the joint retrieval of cloud optical thickness (COT) and cloud-top pressure (CTP). The neural network models are trained offline using a large, high-resolution spectral data set in the O₂-A band generated by the DISAMAR forward model. All neural networks share the same set of input features but predict different targets, including reflectance and the derivatives of reflectance with respect to cloud pressure and cloud optical thickness. These predictions are then used within an optimal estimation framework to retrieve the cloud parameters.

2:15pm - 2:30pm

Generation of Nighttime Visible Bands for the Advanced Himawari Imager based on Deep Learning technologies

Jiaqi Jin^1,2, Jing Li^1,2, Man Sing Wong^1,2,3,4, PW Chan⁵

¹State Key Laboratory of Climate Resilience for Coastal Cities, The Hong Kong Polytechnic University, Hong Kong, China; ²Department of Land Surveying and Geo-Informatics, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China; ³Research Institute for Sustainable Urban Development, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China; ⁴Research Institute of Land and Space, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China; ⁵The Hong Kong Observatory, Hong Kong, China

This study involves remote sensing and artificial intelligence technologies. The study proposed a deep learning-based algorithm to generate the nighttime visible bands for Advanced Himawari Imager geostationary satellite.

2:30pm - 2:45pm

A radiative transfer model-guided deep learning framework for aerosoloptical thicknessretrieval fromsatellite observations

Man Sing Wong^1,2,3,4, Jing Li¹, Kai Qin⁵

¹Department of Land Surveying and Geo-Informatics, The Hong Kong Polytechnic University, Hong Kong SAR, China; ²Research Institute for Sustainable Urban Development, The Hong Kong Polytechnic University, Hong Kong SAR, China; ³Research Institute of Land and Space, The Hong Kong Polytechnic University, Hong Kong SAR, China; ⁴Otto Poon Research Institute for Climate-Resilient Infrastructure, The Hong Kong Polytechnic University, Hong Kong SAR, China; ⁵School of Environment and Spatial Informatics, China University of Mining and Technology, China

Atmospheric aerosols play a vital role in regulating air quality, ecosystems, and climate. Owing to their short atmospheric lifetime, aerosols exhibit strong spatial and temporal variability. Accurate global and regional monitoring of aerosol properties is essential for ecological processes, and radiative forcing. Satellite remote sensing has become a key tool for monitoring aerosol optical thickness (AOT) because of its broad spatial coverage. Traditional physical approaches rely on radiative transfer models (RTMs) to simulate top-of-atmosphere radiances. However, RTMs simplify the real atmosphere, and their accuracy depends strongly on assumed aerosol optical properties and surface reflectance, leading to major uncertainties and inter-algorithm discrepancies. In recent years, data-driven methods have rapidly advanced, driven by developments in machine learning and the increasing availability of collocated satellite and ground-based AOT datasets. The data-driven methods exclusively rely on the data pairs of satellite observations and ground-measured aerosol properties. It learns empirical relationships between satellite observations and measured aerosol properties, and it is more flexible to incorporate more diverse information. However, the AERONET ground stations, commonly used for training, are unevenly distributed and concentrated in urban regions, leaving other surface types such as forests and barren lands underrepresented. Besides, extreme pollution events (e.g., dust storms) are often misclassified as clouds and masked out in AERONET records, introducing bias into training datasets. To mitigate these limitations, this study proposes integrating simulated RTM data into the inversion framework to enhance the robustness and generalization of data-driven AOT retrieval models.

2:45pm - 3:00pm

Evaluating the generalization and uncertainty of data-driven air quality remote sensing models using an idealized testbed

Xinqi Wei¹, Qin He², Kai Qin²

¹Nanjing University of Posts and Telecommunications; ²China University of Mining and Technology

Short annotation如下

Reliable satellite-based estimation of near-surface air pollutants increasingly relies on data-driven models, yet their credibility is hindered by biased generalization assessment and unverified uncertainty estimates. Spatially sparse and unevenly distributed monitoring networks together with strong spatial autocorrelation cause conventional cross-validation approaches to substantially overestimate predictive skill, especially in regions lacking in situ observations. At the same time, although many models produce pixel-level uncertainty estimates, the degree to which these uncertainties reflect true prediction error remains largely unexplored.

This study introduces a controlled, model-agnostic evaluation framework to rigorously examine both spatial generalization and uncertainty reliability in air-quality remote sensing models. A chemical transport model provides a continuous, full-coverage nitrogen dioxide field that serves as an idealized truth. Sampling this field at actual monitoring locations reproduces real observational sparsity while preserving an unbiased reference for domain-wide evaluation. Multiple machine learning models are assessed using sample-based, site-based, and spatially optimized cross-validation to quantify evaluation bias and its dependence on spatial structure.

A dual-path uncertainty strategy is implemented to separately characterize aleatoric and epistemic components, complemented by diagnostic metrics assessing calibration, interval coverage, and sharpness. The framework provides a rigorous pathway for diagnosing reliability in data-driven atmospheric estimation models and supports the development of robust, trustworthy applications in quantitative remote sensing.

3:30pm - 5:15pm

WG II/1B: Image Orientation and Fusion
Location: 715B

3:30pm - 3:45pm

ATOM-ANT3D in Action: 3D Surveying from Confined Spaces to Urban Environments

Ahmad El-Alailyi^1,2, Luca Perfetti³, Fabio Remondino², Francesco Fassi¹

¹3D Survey Group, ABC Department, Politecnico di Milano, Milano, Italy; ²3D Optical Metrology (3DOM) Unit, Bruno Kessler Foundation (FBK), Trento, Italy; ³Department of Civil, Architectural, Environmental Engineering and Mathematics (DICATAM), Università degli Studi di Brescia, Brescia, Italy

This work presents a multi-camera mobile mapping solution designed to deliver accurate and efficient 3D reconstructions across a wide variety of challenging environments, ranging from confined indoor spaces to complex urban outdoor settings. Traditional photogrammetric and terrestrial laser scanning approaches, while capable of high accuracy, often suffer from limitations related to acquisition speed, logistical complexity, and significant post-processing effort—especially in large, occluded, or hard-to-access sites. Mobile Mapping Systems (MMS) based on Visual SLAM (V-SLAM) offer a compelling alternative, thanks to their ability to acquire high-frequency imagery in continuous motion and estimate sensor trajectories in real-time. However, MMS outputs frequently face issues such as reduced geometric accuracy, scale drift in monocular sequences, and the need for extensive optimisation to reach survey-grade results.

To address these limitations, the study extends an existing multi-camera V-SLAM pipeline by tightly integrating monocular estimates with multi-stereo trajectories within the ATOM-ANT3D fisheye multi-camera system. A novel monocular scale-recovery strategy is introduced, based on path-length ratios derived from concurrently recorded stereo tracks. This metrized monocular trajectory is then fused with stereo estimates through a robust pose graph optimisation, followed by a multi-view, feature-based refinement leveraging pre-calibrated camera geometry.

The proposed method is evaluated across four real-world scenarios—spiral tower staircases, dark underground caves, narrow urban corridors, and constrained industrial pipelines. Accuracy is assessed against reference 3D point clouds, while efficiency is compared to a standard multi-view stereo photogrammetric pipeline. Results demonstrate that the integrated approach significantly improves reconstruction consistency, robustness, and end-to-end throughput.

3:45pm - 4:00pm

Shape2Match: A Shape-to-Matching Framework for Infrared and Visible Image Matching

Maoyu Wang, Xulei Shi, Zhuolu Hou, Xinbo Zhao, Xin Huang, Yifan Liao, Yansong Duan, Pengjie Tao

School of Remote Sensing and Information Engineering, Wuhan University, China, People's Republic of

Traditional image matching methods rely heavily on gradient or intensity information. However, the severe nonlinear radiometric distortion (NRD) between infrared and visible images hinders the extraction of repeatable feature points, leading to poor matching performance. To address this, we propose Shape2Match, a novel framework that replaces point features with more consistent, modality-invariant shape features. Specifically, the method utilizes EfficientSAM to extract shape contours and employs elliptic fourier descriptors (EFD) to parameterize and normalize them, creating shape descriptor that is invariant to translation, rotation, and scale. Shape2Match adopts a coarse-to-fine hierarchical strategy: it first performs robust global shape matching using a weighted EFD distance, followed by precise keypoint matching—using Shape Context—within the coarsely aligned shape pairs. We validated Shape2Match on 153 image pairs from 6 datasets, comparing it against methods like SIFT, RIFT, and MS-HLMO. Experimental results demonstrate that Shape2Match achieves a 100\% success rate (SR) across all datasets and significantly outperforms other methods in the number of correct matches (NCM), proving its effectiveness and robustness against NRD, rotation, and scale variations.

4:00pm - 4:15pm

Historical images for surface topography reconstruction intercomparison experiment (Historix)

Amaury Dehecq¹, Friedrich Knuth^2,3, Joaquin M.C. Belart⁴, Livia Piermattei⁵, Camillo Ressl⁶, Robert McNabb⁷, Adina Racoviteanu¹, David Shean⁸, Luc Godin¹

¹University Grenoble Alpes, IRD, CNRS, Grenoble INP, IGE, Grenoble, France; ²Laboratory of Hydraulics, Hydrology and Glaciology (VAW), ETH Zurich, Zurich, Switzerland; ³Swiss Federal Institute for Forest, Snow and Landscape Research (WSL), Birmensdorf, Switzerland; ⁴Natural Science Institute of Iceland, Akranes, Iceland; ⁵Department of Geography, University of Zurich, 8057 Zurich, Switzerland; ⁶TU Wien, Department of Geodesy and Geoinformation, Vienna, 1040, Austria; ⁷School of Geography and Environmental Sciences, Ulster University, BT52 1SA Coleraine, UK; ⁸Department of Civil and Environmental Engineering, University of Washington, Seattle, WA, USA

Historical film-based images, acquired by aerial sensors since the 1930s and by satellite platforms since the 1960s, provide a unique opportunity to document changes in the Earth surface over the 20th century. Yet, they present significant and specific challenges, including complex distortion in the scanned image pixel grid and poorly known camera exterior and interior orientation. In recent years, semi- or fully-automated approaches, based on photogrammetric and computer vision methods, have emerged, but the performance and limitations of these methods have yet to be directly compared.

The objectives of the Historical Images for Surface Topography Reconstruction Intercomparison eXperiment (Historix) project are to compare existing methods for processing stereoscopic historical images and harmonize processing tools. Here we present the study site and dataset selected for this comparison, the design of the intercomparison and evaluation metrics, as well as preliminary results. Full evaluation will be presented at the conference.

4:15pm - 4:30pm

Geolocation enhancement of space borne cameras: the SAR-Optic approach

Philippe Nonin¹, Alexis Barot¹, Pascal Favé², Laurent Gabet¹, Mathilde Jaussaud², Wolfgang Koppe³, Alain Orsoni²

¹Airbus, France; ²Ign, France; ³Airbus, Germany

The location accuracy of an image acquired with a space borne camera relies on the knowledge of the orbit of the spacecraft and the orientation of the camera. The a posteriori estimation of a satellite orbit has been a well mastered technique for a long time. Sub-meter accuracy is achievable with a reasonable effort. The geolocation, with a similar accuracy, of the line of sight of an optical instrument flying at 500km or above is a much more challenging task..

On the other hand, the geolocation of a synthetic aperture radar (SAR) image depends only on the orbit of the spacecraft. It is, therefore, easy to acquire space borne SAR images with a sub-metric native geolocation. The Airbus SAR constellation (TerraSAR-X, TanDEM-X and PAZ) provides, on a commercial basis, images with a (better than) 0.2m geolocation accuracy.

The ability to find, through image matching, homologous points in SAR and optical images would transfer the native accuracy of SAR to optical observations, using classical photogrammetric bundle adjustment.

This paper describes an operational way to perform this SAR/Optic images matching and a validation of the absolute location accuracy achieved.

4:30pm - 4:45pm

Comparative analysis of mainstream image matching methods for georeferencing Tianwen‑1 HIRIC imagery without ground control points

Zhuolu Hou, Aomei Zhang, Yunfei Hu, Xulei Shi, Pei Mi, Penjie Tao, Tao Ke

School of Remote Sensing and Information Engineering, Wuhan University, China, People's Republic of

High-precision mapping of planetary surfaces, such as Mars, relies on matched control points derived from existing georeferenced data, as ground control points (GCPs) cannot be obtained through field measurement. However, the handcrafted image matchers like SIFT limit the robustness of this approach, particularly on texture-scarce and self-similar Martian terrain. While deep learning-based matchers offer a new paradigm, their performance gain for bundle adjustment remains inadequately quantified. This paper systematically evaluates four matchers (hand-crafted SIFT and deep learning-based DOG+HardNet+LightGlue, DISK+LightGlue, and LoFTR), assessing their impact on georeferencing tasks using Tianwen-1 high-resolution imagery. Deep learning methods, such as LoFTR, generate more correspondence points with a more uniform spatial distribution, halving the outlier rate and improving bundle adjustment accuracy by 10%. Our study provides a benchmark for planetary mapping and shows that powerful, learning-based image matchers are pivotal for next-generation automated mapping systems.

4:45pm - 5:00pm

Transforming National Air Photo Archives into Analysis-Ready Geospatial Products

Mozhdeh Shahbazi, Evangelos Bousias Alexakis, Mikhail Sokolov, Ella Mahoro, Pierre Gravel

Canada Centre for Mapping and Earth Observation, Natural Resources Canada, Canada

This work investigates the solutions developed at Natural Resources Canada to produce analysis-ready mapping products from Canada's national air photo library including two main workflows: 1) The photogrammetric processing of historical photos with an emphasis on the more challenging automated georeferencing component; 2) Enhancing interpretability through generative artificial intelligence models for super-resolution and deep colorization.

5:00pm - 5:15pm

The Project evalAT for Investigating the Accuracy of Aerotriangulations in Map Projections

Camillo Ressl¹, Norbert Pfeifer¹, Christine Ressl², Andreas Bayr²

¹TU Wien, Austria; ²BEV – Bundesamt für Eich und Vermessungswesen, Abteilung G2 – Fernerkundung, Wien, Austria

The accuracy of the aerial triangulation (AT) performed in the map projection for a GNSS-INS-supported image block consisting of 4342 vertical images, GSD 20 cm, with 22 main strips and 5 cross strips is investigated. Using 169 check points the obtained results are compared with the accuracy achieved by running the AT in an undistorted tangential system. It turns out, that in both systems the same accuracies can be achieved, with RMSE in (X, Y, Z) of (7, 10, 11) cm, if Earth curvature and scale distortion are correctly modelled in the map projection. If the scale distortion is not considered, then the RMSE in Z increases by 100% to 300% (depending on the height distribution of the GCPs). In AT software packages, that do not consider the scale distortion, a partial compensation is possible by either adapting the height of the projection centres or the principal distance leading to RMSE of around (10, 11, 15) cm.

Date: Saturday, 11-July-2026

8:30am - 10:00am

WG V/1: Education and Training through Curricula Development and Enhanced Learning Practices
Location: 715B

8:30am - 8:45am

Earth Sensing in the Dolomites: A Summer School for Capacity Building and Collaboration on Geomatics for Environmental Applications

Larissa Maria Granja^1,2, Erico Kutchartt^1,3, Enrico Magazzino^1,2, Thomas Zieher⁴, Rasoul Eskandari⁵, Michele Crosetto⁶, Caterina Balletti⁷, Andrea Martino⁷, Mattia Balestra^8,9, Roberto Pierdicca⁹, Lindo Nepi¹⁰, Carlos Cabo¹¹, Anna Iglseder¹², Mauricio Acuna¹³, Borja García Pascual^13,14, Francesco Pirotti^1,2

¹Department of Land, Environment, Agriculture and Forestry (TESAF), University of Padova, Viale dell’Università 16, Legnaro, PD 35020, Italy; ²Interdepartmental Research Center of Geomatics (CIRGEO), University of Padova, Corte Benedettina, Via Roma 34, Legnaro, PD 35020, Italy; ³Forest Science and Technology Centre of Catalonia (CTFC), Carretera de Sant Llorenç de Morunys, Km 2, 25280 Solsona, Spain; ⁴Department of Natural Hazards, Austrian Research Centre for Forests (BFW), Rennweg 1, 6020 Innsbruck, Austria; ⁵Department of Architecture, Built Environment and Construction Engineering, Politecnico di Milano, via Ponzio 31, 20133 Milano, Italy; ⁶Centre Tecnològic de Telecomunicacions de Catalunya (CTTC/CERCA), Geomatics Division, Av. Gauss, 7, E-08860 Castelldefels (Barcelona), Spain; ⁷Università Iuav di Venezia, Santa Croce, 191, Venezia, VE 30135, Italy; ⁸Department of Agricultural, Food and Environmental Sciences (D3A), Università Politecnica delle Marche, Ancona, 60131, Italy; ⁹Department of Civil, Building and Architectural Engineering (DICEA), Università Politecnica delle Marche, Ancona, 60131, Italy; ¹⁰Department of Information Engineering (DII), Università Politecnica delle Marche, Ancona, Italy; ¹¹Department of Mining Exploitation and Prospecting, University of Oviedo, Campus de Mieres, 33600 Mieres, Oviedo, Spain; ¹²Department of Geodesy and Geoinformation, TU Wien, Wiedner Hauptstraße 8-10, 1040 Vienna, Austria; ¹³Forest technology and Wood Material Solutions, Natural Resources Institute Finland (Luke), 80100 Joensuu, Finland; ¹⁴School of Forest Sciences, University of Eastern Finland, 80101 Joensuu, Finland

Capacity building is a key element in promoting and training in spatial technologies and in fostering a network of early-stage researchers for future collaborations. The first edition of the Earth Sensing Summer School, organised by the University of Padova with support from ISPRS Education and Capacity Building (ECBI) funds, was held from 7 to 13 September 2025 in the Dolomite area of the Alpine region. This article illustrates specific aspects of the organisation and discusses the return on investment in terms of training and networking. It highlights the methodology used for selecting participants and conducting the training, which included a balanced combination of seminars, fieldwork, data analysis, dissemination to peers, and a final defence of results. We discuss the outcomes and feedback from the almost 40 participants and provide ideas for future improvements, aiming to offer insights for fellow researchers who might want to replicate a capacity-building activity of this kind.

8:45am - 9:00am

Implementing Team-based Learning in Geomatics Education: enhancing hard and soft Skills in multicultural academic Contexts

Andrea Ajmar¹, Fabio Giulio Tonolo²

¹Interuniversity Department of Regional and Urban Studies and Planning, Politecnico and Università di Torino, Italy; ²Department of Architecture and Design, Politecnico di Torino, Torino, Italy

This practice paper presents the design, implementation, and first evaluation of Team-Based Learning (TBL) activities in university-level Geomatics courses taught in English to multicultural and international student groups. The study documents a structured pathway for adapting TBL to technically demanding subjects, including GIS suitability analysis, network analysis, remote sensing classification, and heat-risk assessment. Its main contribution lies in showing how a pedagogical model widely discussed in general higher education can be translated into software-based Geomatics teaching while supporting both disciplinary learning and intercultural collaboration. The paper also identifies the main organizational conditions for successful adoption, including team formation, workload calibration, and suitable classroom settings. Results from 12 TBL implementations involving 187 students and 470 total participations show clear benefits of teamwork: average team test performance was markedly higher than individual performance, repeated participation was associated with improved results, and student satisfaction increased after the introduction of TBL. Qualitative evidence further indicates gains in communication, teamwork, and intercultural interaction. Although the first implementation required substantial preparation effort, the approach proved replicable and scalable in subsequent editions, making TBL an effective instructional model for Geomatics education.

9:00am - 9:15am

ISPRS SC Summer School: A Global Initiative on Capacity Building and Education Outreach in the Field of Photogrammetry, Remote Sensing and GIS

Laxmi Thapa¹, Nicolas Pucino², Miguel Luis Rivera Lagahit³, Chukwuma John Okolie⁴

¹Aston Business School, Birmingham, United Kingdom; ²Climate Friendly, Sydney, New South Wales, Australia; ³Dynamic Map Platform Co., Ltd., Tokyo, Japan; ⁴African Centre for Cities, School of Architecture Planning and Geomatics, University of Cape Town, South Africa

The ISPRS Student Consortium (SC) Summer Schools are one of the fundamental initiatives that ISPRS SC jointly organises with interested institutions to advance education outreach and capacity building in photogrammetry, remote sensing, and geospatial information sciences. Since their start in 2004, these programs have provided students and young researchers with immersive learning opportunities, combining technical lectures, hands-on sessions, and cultural experiences. Grounded in Experiential Learning Theory, the Summer Schools emphasise real-world application, reflective observation, and collaboration. This paper explores their evolution, global outreach, and educational impacts. Drawing on recent ISPRS SC Summer Schools, including the BUCEA Summer School 2024 on Smart Cities and the Summer School 2024 on AI for Geospatial Applications, the analysis highlights their integration of theory and practice, networking benefits, and transformative cultural exchange. Challenges such as financial barriers and technological gaps are discussed together with recommendations for sustaining and enhancing these initiatives. This study underscores the critical role of ISPRS SC Summer Schools in fostering a global community of geospatial practitioners to address real-world challenges.

9:15am - 9:30am

Legal aspects in photogrammetric curricula: navigating property rights and airspace boundaries

Salvatore Marsico¹, Dimitrios Bolkas¹, Henrique Candido de Oliveira², Matthew Sharr¹

¹Penn State University, United States of America; ²Universidade Estadual de Campinas (Unicamp)

This paper discusses the importance of integrating legal aspects into UAS mapping courses and related curricula providing a framework for integration in an introductory photogrammetric course with sample questions and assignments. Curricula focus is placed on two interrelated issues: first, the extent to which property owners maintain a reasonable expectation of privacy from UAS intrusion within the “immediate reaches” of their airspace; and second, the potential for UAS-mounted sensors to inadvertently capture imagery or point cloud of neighboring properties while operating in compliance with Federal Aviation Administration (FAA) regulations. The discussion concludes by identifying strategies to mitigate these legal and operational challenges, giving students the knowledge and tools to address similar situations in real-life scenarios, and ensuring that aerial surveying practices respect both regulatory compliance and property rights.

9:30am - 9:45am

Building Capacity in Satellite-Based Earth Observation and HQP Training: Canada as a Use Case

Ashraf Elshorbagy, Scott Mitchell, Koreen Millard

Carleton University, Canada

Remote sensing (RS) and especially earth observation (EO) have been used extensively for decades in environmental monitoring, infrastructure asset management, urban planning, emergency response, mapping and many others. The pace of technology

advancements in big data, cloud computing, Geospatial AI (GeoAI) and Geospatial Foundation Models (GeoFM) causes a paradigm shift on how to and who can maximize the potential of remote sensing technology. This paradigm shift challenges traditional geomatics education and pedagogical methods. Additionally, the gap between geomatics graduates’ skills and market needs is widening. The pace of disruptive technology advances like GeoAI and GeoFM often outpaces developments in geomatics education content or suitable pedagogical methods and formats. To address these skills gaps in geomatics courses and courseware, an initiative has been developed

between the Canadian Space Agency and Carleton University, involving more than a dozen different partners spanning industry, government, academia and NGOs. We have been gathering information through qualitative and quantitative techniques to obtain

insights about the soft and hard skills that are valuable and/or lacking in contemporary geomatics graduates, to forecast trends and future needs, and plan how to optimize the introduction of new technology and techniques into the educational content. Based on the mapped feedback, university-level geomatics courses are being redeveloped and updated, and novel course modules, mini-courses and micro credential programs are being developed and tested.

9:45am - 10:00am

Advancing Earth Observation in Africa : Achievements of the WG Africa Copernicus Training of trainers program in three languages

Linda Tomasini¹, Ali Arslan Nadir², Agnès Begue³, Nico Bonora⁴, Catarina Duarte⁵, Maria Daraio⁶, Jean-François Faure⁷, Philippe Gisquet⁸, Carlos Gonzales Inca⁹, Benjamin Palmaerts¹⁰, Michal Krupinski¹¹, Cécilia Leduc¹², Marc Leroy¹³, Marietta Papakonstantinou¹⁴, Cristina Lira¹⁵, Carolina Sa¹⁶, Dimitra Tsoutsou¹⁷

¹CNES, France; ²FMI, Finland; ³CIRAD, France; ⁴ISPRA, Italy; ⁵Air Centre, Portugal; ⁶ASI, Italy; ⁷IRD, France; ⁸Visioterra, France; ⁹University of Turku, Finland; ¹⁰ISSEP, Belgium; ¹¹CBK PAN, Poland; ¹²IDGEO, France; ¹³Space4Dev, France; ¹⁴NOA, Greece; ¹⁵IPMA, Portugal; ¹⁶PT Space, Portugal; ¹⁷PRAXI network, Greece

The WG Africa project is a collaborative initiative bringing together 12 national institutions from 8 European countries. Its objective is to support and strengthen the use of Copernicus data and services in Africa through a training-of-trainers program funded by the European Commission under the Framework Partnership Agreement on Copernicus User Uptake (FPCUP) and implemented in French, Portuguese, and English. To widely support the Copernicus products uptake, the primary goal is to collaborate with African academic and private-sector trainers by integrating Copernicus-based modules into their training programs or curricula. This initiative complements other capacity-building efforts in space-based Earth Observation in Africa, such as GMES & Africa and the Global Gateway European initiative.

10:00am - 10:15am

Geospatial UK Higher Education – status, challenges, and outreach initiatives

Henny Mills, Craig Robson, Alex Linighan, Stuart Edwards, Chris Larkin

Newcastle University, United Kingdom

Geospatial education in the UK is facing a critical decline, despite the increasing relevance of 3D reality capture and spatial technologies across sectors. While industry recognises the value of geospatial skills, the absence of coordinated national policy or incentives has led to the closure of key undergraduate programmes. Notably, Newcastle University closed its geospatial UG programme in 2023. The University of East London remains the only UK institution offering a dedicated undergraduate surveying degree, supplemented by an industry-linked apprenticeship. To address the skills gap, several universities now offer postgraduate conversion courses in geospatial science, primarily within geography or environmental science departments.

Outreach has emerged as a vital strategy to raise awareness and inspire future talent. GeospatialUK.org, developed at Newcastle University with industry support, provides accessible resources on careers, study pathways, and classroom activities aligned with UK education curricula at high school level. Its exercises—ranging from mapping hazards, wildfires, census data to GNSS-based calculations—bridge advanced research with school-level learning. It also offers insight into geospatial relate careers and links to possible job opportunities. The platform has gained international traction and continues to attract users.

This paper highlights the urgent need for national coordination in geospatial education and showcases GeospatialUK.org as a scalable model for outreach. Without intervention, the UK risks a shortage of skilled geospatial professionals, undermining its capacity to address pressing societal challenges

10:30am - 12:00pm

WG II/3H: 3D Scene Reconstruction for Modeling & Mapping
Location: 715B

10:30am - 10:45am

Accurate Point Measurement in 3DGS - A New Alternative to Traditional Stereoscopic-View Based Measurements

Deyan Deng¹, Rongjun Qin^1,2

¹Department of Civil, Environmental and Geodetic Engineering, The Ohio State University, Columbus, USA; ²Department of Electrical and Computer Engineering, The Ohio State University, Columbus, USA

3D Gaussian Splatting (3DGS) has revolutionized real-time rendering with state-of-the-art novel view synthesis, but its applicability to accurate geometric measurement remains limited. Compared with multi-view stereo (MVS)-based point clouds or mesh models, 3DGS provides superior visual quality and completeness, while existing measurement approaches still rely on stereoscopic workstations or direct measurements on incomplete and inaccurate reconstructed geometry.

As a novel view synthesizer, 3DGS reproduces source views and smoothly interpolates intermediate viewpoints, enabling users to intuitively identify congruent points across multiple views. By triangulating these correspondences, accurate 3D point measurements can be obtained. Inspired by traditional stereoscopic measurement, the proposed approach removes the need for stereo workstations and biological stereoscopic capability, while naturally supporting multi-view measurements for improved accuracy.

We implement a web-based application to demonstrate this proof of concept using UAV-based aerial datasets. Experimental results show that the proposed method achieves measurement accuracy comparable to or better than traditional stereoscopic measurement approaches while operating entirely on non-stereo workstations. In particular, the proposed method consistently outperforms direct mesh-based measurements, achieving RMSEs of 1–2 cm on well-defined points. On challenging thin structures, the proposed method reduces RMSE from 0.062 m to 0.037 m, and successfully measures sharp corners where mesh-based methods fail entirely.

The source code and documentation are open-source and available at: https://github.com/GDAOSU/3dgs_measurement_tool.

10:45am - 11:00am

Gaussian Texturing: Surface-Anchored 3D Gaussian Splatting for Metric-Accurate Heritage Preservatio

Ming Huang, Lei Wang

Beijing University of Civil Engineering and Architecture,

Traditional 3D Gaussian Splatting (3DGS) methods initialize Gaussian primitives from Structure-from-Motion point clouds, resulting in loosely distributed representations that lack geometric constraints and metric accuracy. This limitation severely restricts their application in architectural heritage preservation, where millimeter-level precision and practical editability are essential requirements.

This paper introduces Gaussian Texturing, a novel framework that fundamentally transforms how Gaussians relate to geometry by directly binding 3D Gaussian primitives to precisely measured mesh surfaces—essentially "texturing" surfaces with Gaussians. Our approach comprises three key innovations: (1) a constrained optimization framework that maintains tight Gaussian-surface coupling throughout training, preventing geometric drift while preserving photorealistic rendering quality; (2) engineering-oriented editing tools enabling geometry-based material replacement, region editing, and mesh-driven deformation; and (3) seamless integration with professional heritage preservation workflows.

Experimental validation on MipNeRF360 benchmarks and custom architectural datasets demonstrates that our method achieves millimeter-level geometric precision while maintaining competitive rendering metrics. Unlike traditional "bind-after-training" approaches, our direct surface binding paradigm eliminates intermediate reconstruction steps, ensuring accuracy from source data. Real-world applications in heritage documentation and architectural design confirm the method's practical value, successfully bridging the gap between photorealistic visualization and engineering-grade geometric accuracy for professional applications.

11:00am - 11:15am

Structured-Li-GS: Structured 3D Gaussians Splatting with LiDAR Incorporation and Spatial Constraints

Huaiyuan Weng, Huibin Li, Chul Min Yeum

University of Waterloo, Canada

In this study, we develop a Structured framework for Gaussian Splatting (3DGS) with LiDAR integration (Structured-Li-GS). It is a lightweight Gaussian Splatting pipeline that leverages LiDAR–inertial–visual SLAM. Structured-Li-GS achieves high-quality 3D reconstructions with fewer Gaussians by training on accurate, dense, colorized point clouds. Gaussian primitives are anchored using sub-sampled point clouds, and their ellipsoidal parameters are initialized from local surface geometry. Our training strategy integrates a comprehensive set of loss terms, including photometric, flattening, offset, depth, and normal losses—guided by the dense point cloud, enabling accurate reconstruction without Gaussian densification. This approach produces up-to-scale, high-fidelity results with a moderate model size. For experimental validation, we develop a custom hardware-synchronized LiDAR–camera handheld scanner. Experiments on both benchmark datasets and our real-world in-house dataset demonstrate that Structured-Li-GS surpasses state-of-the-art methods while using fewer Gaussians.

11:15am - 11:30am

Evaluating 3DGS for True Orthophoto Generation: Comparative Study with Photogrammetric Processes

Jangwoo Cheon¹, Impyeong Lee²

¹Innopam, Korea, Republic of (South Korea); ²University of Seoul, Korea, Republic of (South Korea)

True Digital Orthophoto Maps (TDOMs) are essential for urban analysis and map updating, traditionally generated through photogrammetric workflows involving aerial triangulation, DSM construction, and orthorectification. Recently, 3D Gaussian Splatting (3DGS) has emerged as an alternative approach using differentiable volumetric rendering. While both methods depend on acquisition geometry, they follow fundamentally different reconstruction processes, potentially producing distinct representational characteristics. Systematic comparisons under controlled conditions remain limited.

This study generates photogrammetric and 3DGS-based TDOMs from four UAV datasets acquired over the same area with varying resolution (2.51–5.8 cm GSD), image count, and oblique view proportion (0–75%). All datasets were preprocessed through common SfM to obtain identical inputs. We evaluate differences through inter-method agreement (PSNR, SSIM, LPIPS), detail preservation (gradient magnitude, high-frequency energy), and spatial distribution patterns (boundary–interior separation).

Results show 3DGS systematically smooths fine-scale texture with gradient ratios of 0.58–0.89 and high-frequency energy reductions of 2.5–55× relative to photogrammetry. Oblique view proportion emerges as the dominant divergence factor: oblique-dominant datasets show lowest agreement (PSNR 15.15) despite larger image counts, while nadir-only datasets achieve higher similarity (PSNR 26.73). Difference maps reveal 2–3 times higher discrepancies along boundaries than interiors. Visually cleaner 3DGS boundaries are byproducts of overall smoothing rather than superior reconstruction. These findings establish that the two methods are complementary—photogrammetry preserving texture fidelity and 3DGS providing structural regularity—with acquisition geometry critically influencing performance characteristics.

11:30am - 11:45am

Supercharging Thermal Gaussian Splatting with depth estimation

Manoj Biswanath¹, Chenxin Cai², Hannah Schieber³, Daniel Roth³, Benjamin Busam¹

¹Photogrammetry and Remote Sensing, Munich Center for Machine Learning (MCML), Technical University of Munich, Munich, Germany; ²Technical University of Munich, Munich, Germany; ³Human-Centered Computing and Extended Reality Lab, TUM University Hospital, Clinic for Orthopedics and Sports Orthopedics, Munich Institute of Robotics and Machine Intelligence (MIRMI), Technical University of Munich, Munich, Germany

Efficient and robust 3D scene representation is crucial in fields such as robotics, autonomous driving, and augmented reality. While RGBimagesprovidevaluable content for 3D reconstruction, other modalities like thermal or depth can enable additional information on the 3D environment. Lately, Novel View Synthesis (NVS) methods like Gaussian Splatting (GS) have started using multiple modalities to further boost their performance. But fusing or combining those multi-modal data can make the process slower and bring in additional challenges. Therefore, our project aims to use single modality based on thermal infrared domain, by removing the reliance on visible light, as much as possible. We propose a method Thermal-to-Depth Gaussian (TDg), that uses only thermal images and depth estimation in its architecture to derive the radiance fields. Mainstream methods relying heavily on RGB images, perform poorly in visually degraded environments, such as low-light conditions, fog, smoke, or extreme weather. Contrary to this, infrared cameras can detect objects’ inherent thermal radiation and provide a robust perception, suitable regardless of lighting and weather conditions. But despite their promise, thermal images are inherently characterized by low contrast, sparse texture, and non-uniform brightness distribution. So current approaches still rely heavily on paired RGB images for supervision or joint optimization, failing to establish a truly independent and purely thermal-based Gaussian representation system. Therefore, the core innovation of our work is to prepare a self contained Thermal GS framework that uses only thermal image inputs. We design a thermal-guided depth estimation module, Thermal-to-Depth (TDg), providing explicit and reliable constraints for geometric optimization.