JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at isprs2026@icsevents.com.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Daily Overview

Location: 715B
125 theatre

Date: Sunday, 05-July-2026

8:30am - 12:00pm

TuT15: Getting Started with CNES Open-Source 3D Tools in Python
Location: 715B

12:00pm - 1:15pm

WG IV/8A: Digital Twins for Mobility and Navigation
Location: 715B

12:00pm - 12:15pm

Vision-Language Models for Urban Digital Twins

Amirhossein Nourbakhshrezaei, Saeed Abbasi, Mojgan Jadidi

Civil Engineering Department, Lassonde School of Engineering, York University, Canada

Urban digital twins are virtual city replicas that can greatly support urban planning by simulating infrastructure and mobility scenarios. However, keeping a digital twin up-to-date with fine-grained, real-world urban conditions is challenging. This paper proposes a novel system that leverages multi-modal AI models to bridge the gap between physical urban data collection and a 3D city digital twin. In our approach, ordinary smartphones carried in vehicles act as mobile sensors, continuously capturing multi-modal data (road images, GPS coordinates, and speed). Advanced vision-language models then analyze the data to automatically extract information from the traffic infrastructure and detect road anomalies. The extracted information such as the locations of traffic signs, traffic signals, road surface cracks, and potential blind spots at intersections is geo-tagged and streamed into language-vision models to interpret data and stream human readable insights into the digital twin model. The case study is the digital twin of the city of Toronto. By aggregating data from many drivers and analyzing it (in post-processing for high accuracy), the digital twin evolves into a living model of the urban environment. This enriched and dynamic twin provides urban planners with up-to-date insights on traffic signage, road conditions, and other relevant road infrastructure elements, enabling proactive maintenance and informed decision-making for city planning.

12:15pm - 12:30pm

GeoSceneGraph: Geometric Scene Graph Diffusion Model for Text-guided 3D Indoor Scene Synthesis

Antonio Ruiz^1,2, Tao Wu¹, Andrew Melnik², Qing Cheng¹, Xuqin Wang¹, Lu Liu¹, Yongliang Wang¹, Yanfeng Zhang¹, Helge Ritter²

¹Huawei Technologies; ²Center for Cognitive Interaction Technology (CITEC), Bielefeld University

Methods that synthesize indoor 3D scenes from text prompts have wide-ranging applications in film production, interior design, video games, virtual reality, and synthetic data generation for training embodied agents. Existing approaches typically either train generative models from scratch or leverage vision-language models (VLMs). While VLMs achieve strong performance, particularly for complex or open-ended prompts, smaller task-specific models remain necessary for deployment on resource-constrained devices such as extended reality (XR) glasses or mobile phones. However, many generative approaches that train from scratch overlook the inherent graph structure of indoor scenes, which can limit scene coherence and realism. Conversely, methods that incorporate scene graphs either demand a user-provided semantic graph, which is generally inconvenient and restrictive, or rely on ground-truth relationship annotations, limiting their capacity to capture more varied object interactions. To address these challenges, we introduce GeoSceneGraph, a method that synthesizes 3D scenes from text prompts by leveraging the graph structure and geometric symmetries of 3D scenes, without relying on predefined relationship classes. Despite not using ground-truth relationships, GeoSceneGraph achieves performance comparable to methods that do. Our model is built on Equivariant Graph Neural Networks (EGNNs), but existing EGNN approaches are typically limited to low-dimensional conditioning and are not designed to handle complex modalities such as text. We propose a simple and effective strategy for conditioning EGNNs on text features, and we validate our design through ablation studies.

12:30pm - 12:45pm

A Multi-Dimensional Digital Twin Framework for the Low-Altitude Economy

Yan Li, Chenming Ye, Wenxuan Shi, Wenqing Zhang, Yuyang Zhang, Teng Hu, Zhizhong Kang

China University of Geosciences (Beijing), China, People's Republic of

The Low-Altitude Economy (LAE), driven by the widespread deployment of UAVs and eVTOL aircraft, demands a high-fidelity Digital Twin that extends far beyond static geographic representation. This study presents a critical review of 39 peer-reviewed papers to propose a three-layer mapping framework — Geospatial Infrastructure Layer, Environmental Sensing Layer, and Interaction Layer — and evaluates the Technology Readiness Level (TRL) of each sub-domain. The Geospatial Infrastructure Layer encompasses terrain models, ground facilities, airspace structures, and semantic navigation landmarks. The Environmental Sensing Layer covers electromagnetic modeling, target sensing and countermeasure, and micro-meteorological mapping. The Interaction Layer addresses network trust, data security, swarm coordination, and platform reliability. Our TRL assessment reveals that Environmental Sensing is the most mature layer (mean TRL 4.4, 8 field-validated papers), while cross-layer integration remains the weakest link (mean TRL 3.5, zero field-validated demonstrations). We identify standardization of low-altitude spatial data products, AI-enabled predictive mapping, crowdsourced Digital Twin updating, and closed-loop cross-layer integration as the four priority research directions.

12:45pm - 1:00pm

Lightweight indoor Pedestrian Localization via multi-step State-extended Fusion of Wi-Fi weighted Fingerprinting and PDR

Renjie Yuan, Zhiyong Wang, Kunlin Yu, Lang Hu, Yonghan Liao, Yilang Lin, Junjie Lu

South China University of Technology, China, People's Republic of

Indoor pedestrian localization on smartphones requires a lightweight yet robust fusion framework capable of handling unstable wireless signals and drift-prone inertial motion. In this work, we propose an efficient positioning system that integrates weighted Wi-Fi fingerprinting with Pedestrian Dead Reckoning (PDR) through a Multi-Step Extended Kalman Filter (EKF). Unlike conventional single-step filtering, the EKF employed here is formulated from the perspective of factor-graph–based probabilistic estimation, where Wi-Fi observations and PDR increments naturally act as complementary measurement and motion factors. Building on this unified view, the proposed Multi-Step EKF retains several consecutive states within its estimation window, effectively approximating short-horizon smoothing while maintaining the computational footprint required for real-time execution on consumer smartphones.

To enhance observation stability, an inverse-distance weighted fingerprinting module mitigates RSSI fluctuations and gracefully handles missing values. Meanwhile, PDR inputs are refined through diagnostic analysis and incorporated as motion constraints within the fusion process. Global optimization of noise parameters is performed via dual annealing, further improving the reliability of state updates.

Experiments conducted on an open indoor dataset demonstrate that the proposed method achieves a reduction of approximately 31% in mean localization error compared with a standard single-step EKF baseline. The results confirm that enforcing short-term temporal consistency through a multi-step state representation significantly suppresses drift accumulation and enhances robustness under dynamic indoor environments. Overall, the proposed framework offers a theoretically grounded, computationally efficient, and practically deployable solution for smartphone-based indoor positioning.

1:30pm - 2:45pm

WG IV/8B: Digital Twins for Mobility and Navigation
Location: 715B

1:30pm - 1:45pm

Topological Analysis of OpenDRIVE Models for Advanced Autonomous Vehicle Simulations

Janos Mate Logo, Viktor Gyozo Horvath, Vivien Poto, Arpad Barsi

Budapest University of Technology and Economics, Department of Photogrammetry and Geoinformatics, Hungary

The increasing demand for safe and efficient autonomous vehicle (AV) operations has intensified the need for realistic, high-fidelity digital road representations that enable robust virtual testing environments. Simulation-based validation has become a cornerstone of the AV development process, allowing for the reproducible assessment of perception, localization, and decision-making modules under controlled conditions. Within this context, the ASAM OpenDRIVE specification provides a standardized, XML-based description of static road networks, encapsulating geometric, semantic, and structural elements such as roads, lanes, junctions, and roadside objects. While previous research has primarily focused on the geometric accuracy and semantic richness of High Definition (HD) maps, comprehensive topological analyses—especially those addressing consistency, connectivity, and completeness of OpenDRIVE models—remain largely unexplored. This study aims to fill that gap by introducing a formal topological framework for evaluating OpenDRIVE-based road models through both synthetic and real-world test cases.

1:45pm - 2:00pm

A Comprehensive Toolkit for Semi-Automated HD Maps Production: Integrating AI-Driven Feature Extraction with 3D Interactive Validation and Editing

YI-FENG CHANG, KAI-WEI CHIANG, MENG-LUN TSAI, PEI-LING LEE, SYUN TSAI, CHI-HSIN HUANG, TING-CHUN WU, YI-HAN JEN, JIUN-YO LIN, CHANG-LE LEE, CHING-HSIANG LIN, HAN-CHE HUANG

National Cheng Kung University, Chinese Taipei

This paper presents a comprehensive toolkit for semi-automated High-Definition Maps (HD Maps) production that integrates Artificial Intelligence (AI)-driven feature extraction with 3D human-in-the-loop validation. High-definition maps provide centimeter-level road geometry and traffic asset information, but large-scale production remains costly due to dense mobile mapping data and manual digitization. The proposed workflow consists of two self-developed components: a Semi-automated HD Maps Production Tool for batch extraction and a 3D HD Maps Validation and Editing Tool for structured review. The project-based pipeline ingests georeferenced mobile laser scanning point clouds, Inertial Navigation System / Global Navigation Satellite System (INS/GNSS) trajectories, and camera imagery, and applies configurable chains of ground filtering, road-marking extraction, voxel down-sampling, clustering, oriented bounding box analysis, and AI-based traffic asset detection. Candidate features with confidence indicators and basic attributes are stored in a project database and edited in a tightly coupled 3D environment that supports snapping, constrained adjustments, and semantic reclassification while logging all user edits. The toolkit is evaluated on a closed proving ground (CARLab, Shalun) and a freeway section of Taiwan National Highway No. 3. At CARLab, semi-automated extraction achieves F1-scores of 0.85–0.95 for key layers. For a one-kilometer highway section, operator time is reduced from 90–120 minutes in a purely manual Geographic Information System (GIS) workflow to about 45 minutes with the proposed approach, while maintaining comparable geometric accuracy. These results demonstrate a practical path towards scalable, traceable HD Maps production for autonomous driving applications.

2:00pm - 2:15pm

A Low-Altitude Data Space Framework Based on China̓s National 3D Mapping Program

Yin Gao^1,2, Jun Chen^1,2, Dehu Yang^1,3, Chaoquan Zhang¹, Fengyu Han¹

¹Moganshan Geospatial Information Laboratory, Huzhou, 313200, China; ²National Geomatics Center of China, Beijing 100830, China; ³Faculty of Geosciences and Engineering, Southwest Jiaotong University, Chengdu 610031, China

National Key Research and Development Program of China (2025YFB3910300);

2:15pm - 2:30pm

Geometrically accurate 3D Gaussian Reconstruction using high-density UAV LiDAR point clouds and open-vocabulary semantic optimization

Banghui Yang^1,2, Kai Qin^1,2, Yucheng Li^1,2, Jing Li^1,3

¹Aerospace Information Research Institute，Chinese Academy of Science, China, People's Republic of; ²University of Chinese Academy of Sciences,Beijing; ³International Research Center of Big Data for Sustainable Development Goals, China

3D scene reconstruction lies at the core of computer vision, photogrammetry, and geospatial science, spatial intelligence, aiming for accurate, photorealistic, and efficient digital twin representations of the real world. The emergence of revolutionary 3D Gaussian Splatting (3DGS) enables real-time rendering and geometrically precise reconstruction, yet existing methods struggle in large-scale outdoor scenes with weak textures, low geometric accuracy, dynamic objects, and lack of semantic information. Therefore, geometrically accurate 3D GS with enhanced semantic understanding greatly facilities the realization of digital twins for mobility and navigation. This work proposes a novel 3DGS framework which seamlessly incorporates dense UAV LiDAR point clouds, multi-view images and open-set semantics in an all-in-one optimization process. The key objective here is to investigate how geometric constraints derived from dense UAV LiDAR point clouds and cognitive supervision from SAM (Segment Anything Model) semantics can jointly participate in the optimization of Gaussian primitives, thereby improving geometry accuracy, visual realism, and semantic consistency in large-scale UAV 3D reconstructions for creating digital twins of the environments.