Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Daily Overview | |
|
Location: 715B 125 theatre |
| Date: Sunday, 05-July-2026 | |
| 8:30am - 12:00pm | TuT15: Getting Started with CNES Open-Source 3D Tools in Python Location: 715B |
| 12:00pm - 1:15pm | WG IV/8A: Digital Twins for Mobility and Navigation Location: 715B |
|
|
12:00pm - 12:15pm
Vision-Language Models for Urban Digital Twins Civil Engineering Department, Lassonde School of Engineering, York University, Canada Urban digital twins are virtual city replicas that can greatly support urban planning by simulating infrastructure and mobility scenarios. However, keeping a digital twin up-to-date with fine-grained, real-world urban conditions is challenging. This paper proposes a novel system that leverages multi-modal AI models to bridge the gap between physical urban data collection and a 3D city digital twin. In our approach, ordinary smartphones carried in vehicles act as mobile sensors, continuously capturing multi-modal data (road images, GPS coordinates, and speed). Advanced vision-language models then analyze the data to automatically extract information from the traffic infrastructure and detect road anomalies. The extracted information such as the locations of traffic signs, traffic signals, road surface cracks, and potential blind spots at intersections is geo-tagged and streamed into language-vision models to interpret data and stream human readable insights into the digital twin model. The case study is the digital twin of the city of Toronto. By aggregating data from many drivers and analyzing it (in post-processing for high accuracy), the digital twin evolves into a living model of the urban environment. This enriched and dynamic twin provides urban planners with up-to-date insights on traffic signage, road conditions, and other relevant road infrastructure elements, enabling proactive maintenance and informed decision-making for city planning. 12:15pm - 12:30pm
GeoSceneGraph: Geometric Scene Graph Diffusion Model for Text-guided 3D Indoor Scene Synthesis 1Huawei Technologies; 2Center for Cognitive Interaction Technology (CITEC), Bielefeld University Methods that synthesize indoor 3D scenes from text prompts have wide-ranging applications in film production, interior design, video games, virtual reality, and synthetic data generation for training embodied agents. Existing approaches typically either train generative models from scratch or leverage vision-language models (VLMs). While VLMs achieve strong performance, particularly for complex or open-ended prompts, smaller task-specific models remain necessary for deployment on resource-constrained devices such as extended reality (XR) glasses or mobile phones. However, many generative approaches that train from scratch overlook the inherent graph structure of indoor scenes, which can limit scene coherence and realism. Conversely, methods that incorporate scene graphs either demand a user-provided semantic graph, which is generally inconvenient and restrictive, or rely on ground-truth relationship annotations, limiting their capacity to capture more varied object interactions. To address these challenges, we introduce GeoSceneGraph, a method that synthesizes 3D scenes from text prompts by leveraging the graph structure and geometric symmetries of 3D scenes, without relying on predefined relationship classes. Despite not using ground-truth relationships, GeoSceneGraph achieves performance comparable to methods that do. Our model is built on Equivariant Graph Neural Networks (EGNNs), but existing EGNN approaches are typically limited to low-dimensional conditioning and are not designed to handle complex modalities such as text. We propose a simple and effective strategy for conditioning EGNNs on text features, and we validate our design through ablation studies. 12:30pm - 12:45pm
A Multi-Dimensional Digital Twin Framework for the Low-Altitude Economy China University of Geosciences (Beijing), China, People's Republic of The Low-Altitude Economy (LAE), driven by the widespread deployment of UAVs and eVTOL aircraft, demands a high-fidelity Digital Twin that extends far beyond static geographic representation. This study presents a critical review of 39 peer-reviewed papers to propose a three-layer mapping framework — Geospatial Infrastructure Layer, Environmental Sensing Layer, and Interaction Layer — and evaluates the Technology Readiness Level (TRL) of each sub-domain. The Geospatial Infrastructure Layer encompasses terrain models, ground facilities, airspace structures, and semantic navigation landmarks. The Environmental Sensing Layer covers electromagnetic modeling, target sensing and countermeasure, and micro-meteorological mapping. The Interaction Layer addresses network trust, data security, swarm coordination, and platform reliability. Our TRL assessment reveals that Environmental Sensing is the most mature layer (mean TRL 4.4, 8 field-validated papers), while cross-layer integration remains the weakest link (mean TRL 3.5, zero field-validated demonstrations). We identify standardization of low-altitude spatial data products, AI-enabled predictive mapping, crowdsourced Digital Twin updating, and closed-loop cross-layer integration as the four priority research directions. 12:45pm - 1:00pm
Lightweight indoor Pedestrian Localization via multi-step State-extended Fusion of Wi-Fi weighted Fingerprinting and PDR South China University of Technology, China, People's Republic of Indoor pedestrian localization on smartphones requires a lightweight yet robust fusion framework capable of handling unstable wireless signals and drift-prone inertial motion. In this work, we propose an efficient positioning system that integrates weighted Wi-Fi fingerprinting with Pedestrian Dead Reckoning (PDR) through a Multi-Step Extended Kalman Filter (EKF). Unlike conventional single-step filtering, the EKF employed here is formulated from the perspective of factor-graph–based probabilistic estimation, where Wi-Fi observations and PDR increments naturally act as complementary measurement and motion factors. Building on this unified view, the proposed Multi-Step EKF retains several consecutive states within its estimation window, effectively approximating short-horizon smoothing while maintaining the computational footprint required for real-time execution on consumer smartphones. To enhance observation stability, an inverse-distance weighted fingerprinting module mitigates RSSI fluctuations and gracefully handles missing values. Meanwhile, PDR inputs are refined through diagnostic analysis and incorporated as motion constraints within the fusion process. Global optimization of noise parameters is performed via dual annealing, further improving the reliability of state updates. Experiments conducted on an open indoor dataset demonstrate that the proposed method achieves a reduction of approximately 31% in mean localization error compared with a standard single-step EKF baseline. The results confirm that enforcing short-term temporal consistency through a multi-step state representation significantly suppresses drift accumulation and enhances robustness under dynamic indoor environments. Overall, the proposed framework offers a theoretically grounded, computationally efficient, and practically deployable solution for smartphone-based indoor positioning. |
| 1:30pm - 2:45pm | WG IV/8B: Digital Twins for Mobility and Navigation Location: 715B |
|
|
1:30pm - 1:45pm
Topological Analysis of OpenDRIVE Models for Advanced Autonomous Vehicle Simulations Budapest University of Technology and Economics, Department of Photogrammetry and Geoinformatics, Hungary The increasing demand for safe and efficient autonomous vehicle (AV) operations has intensified the need for realistic, high-fidelity digital road representations that enable robust virtual testing environments. Simulation-based validation has become a cornerstone of the AV development process, allowing for the reproducible assessment of perception, localization, and decision-making modules under controlled conditions. Within this context, the ASAM OpenDRIVE specification provides a standardized, XML-based description of static road networks, encapsulating geometric, semantic, and structural elements such as roads, lanes, junctions, and roadside objects. While previous research has primarily focused on the geometric accuracy and semantic richness of High Definition (HD) maps, comprehensive topological analyses—especially those addressing consistency, connectivity, and completeness of OpenDRIVE models—remain largely unexplored. This study aims to fill that gap by introducing a formal topological framework for evaluating OpenDRIVE-based road models through both synthetic and real-world test cases. 1:45pm - 2:00pm
A Comprehensive Toolkit for Semi-Automated HD Maps Production: Integrating AI-Driven Feature Extraction with 3D Interactive Validation and Editing National Cheng Kung University, Chinese Taipei This paper presents a comprehensive toolkit for semi-automated High-Definition Maps (HD Maps) production that integrates Artificial Intelligence (AI)-driven feature extraction with 3D human-in-the-loop validation. High-definition maps provide centimeter-level road geometry and traffic asset information, but large-scale production remains costly due to dense mobile mapping data and manual digitization. The proposed workflow consists of two self-developed components: a Semi-automated HD Maps Production Tool for batch extraction and a 3D HD Maps Validation and Editing Tool for structured review. The project-based pipeline ingests georeferenced mobile laser scanning point clouds, Inertial Navigation System / Global Navigation Satellite System (INS/GNSS) trajectories, and camera imagery, and applies configurable chains of ground filtering, road-marking extraction, voxel down-sampling, clustering, oriented bounding box analysis, and AI-based traffic asset detection. Candidate features with confidence indicators and basic attributes are stored in a project database and edited in a tightly coupled 3D environment that supports snapping, constrained adjustments, and semantic reclassification while logging all user edits. The toolkit is evaluated on a closed proving ground (CARLab, Shalun) and a freeway section of Taiwan National Highway No. 3. At CARLab, semi-automated extraction achieves F1-scores of 0.85–0.95 for key layers. For a one-kilometer highway section, operator time is reduced from 90–120 minutes in a purely manual Geographic Information System (GIS) workflow to about 45 minutes with the proposed approach, while maintaining comparable geometric accuracy. These results demonstrate a practical path towards scalable, traceable HD Maps production for autonomous driving applications. 2:00pm - 2:15pm
A Low-Altitude Data Space Framework Based on China̓s National 3D Mapping Program 1Moganshan Geospatial Information Laboratory, Huzhou, 313200, China; 2National Geomatics Center of China, Beijing 100830, China; 3Faculty of Geosciences and Engineering, Southwest Jiaotong University, Chengdu 610031, China National Key Research and Development Program of China (2025YFB3910300); 2:15pm - 2:30pm
Geometrically accurate 3D Gaussian Reconstruction using high-density UAV LiDAR point clouds and open-vocabulary semantic optimization 1Aerospace Information Research Institute,Chinese Academy of Science, China, People's Republic of; 2University of Chinese Academy of Sciences,Beijing; 3International Research Center of Big Data for Sustainable Development Goals, China 3D scene reconstruction lies at the core of computer vision, photogrammetry, and geospatial science, spatial intelligence, aiming for accurate, photorealistic, and efficient digital twin representations of the real world. The emergence of revolutionary 3D Gaussian Splatting (3DGS) enables real-time rendering and geometrically precise reconstruction, yet existing methods struggle in large-scale outdoor scenes with weak textures, low geometric accuracy, dynamic objects, and lack of semantic information. Therefore, geometrically accurate 3D GS with enhanced semantic understanding greatly facilities the realization of digital twins for mobility and navigation. This work proposes a novel 3DGS framework which seamlessly incorporates dense UAV LiDAR point clouds, multi-view images and open-set semantics in an all-in-one optimization process. The key objective here is to investigate how geometric constraints derived from dense UAV LiDAR point clouds and cognitive supervision from SAM (Segment Anything Model) semantics can jointly participate in the optimization of Gaussian primitives, thereby improving geometry accuracy, visual realism, and semantic consistency in large-scale UAV 3D reconstructions for creating digital twins of the environments. |

