Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Daily Overview |
| Session | ||
WG IV/8A: Digital Twins for Mobility and Navigation
Session Topics: Digital Twins for Mobility and Navigation (WG IV/8)
| ||
| External Resource: http://www.commission4.isprs.org/wg8 | ||
| Presentations | ||
12:00pm - 12:15pm
Vision-Language Models for Urban Digital Twins Civil Engineering Department, Lassonde School of Engineering, York University, Canada Urban digital twins are virtual city replicas that can greatly support urban planning by simulating infrastructure and mobility scenarios. However, keeping a digital twin up-to-date with fine-grained, real-world urban conditions is challenging. This paper proposes a novel system that leverages multi-modal AI models to bridge the gap between physical urban data collection and a 3D city digital twin. In our approach, ordinary smartphones carried in vehicles act as mobile sensors, continuously capturing multi-modal data (road images, GPS coordinates, and speed). Advanced vision-language models then analyze the data to automatically extract information from the traffic infrastructure and detect road anomalies. The extracted information such as the locations of traffic signs, traffic signals, road surface cracks, and potential blind spots at intersections is geo-tagged and streamed into language-vision models to interpret data and stream human readable insights into the digital twin model. The case study is the digital twin of the city of Toronto. By aggregating data from many drivers and analyzing it (in post-processing for high accuracy), the digital twin evolves into a living model of the urban environment. This enriched and dynamic twin provides urban planners with up-to-date insights on traffic signage, road conditions, and other relevant road infrastructure elements, enabling proactive maintenance and informed decision-making for city planning. 12:15pm - 12:30pm
GeoSceneGraph: Geometric Scene Graph Diffusion Model for Text-guided 3D Indoor Scene Synthesis 1Huawei Technologies; 2Center for Cognitive Interaction Technology (CITEC), Bielefeld University Methods that synthesize indoor 3D scenes from text prompts have wide-ranging applications in film production, interior design, video games, virtual reality, and synthetic data generation for training embodied agents. Existing approaches typically either train generative models from scratch or leverage vision-language models (VLMs). While VLMs achieve strong performance, particularly for complex or open-ended prompts, smaller task-specific models remain necessary for deployment on resource-constrained devices such as extended reality (XR) glasses or mobile phones. However, many generative approaches that train from scratch overlook the inherent graph structure of indoor scenes, which can limit scene coherence and realism. Conversely, methods that incorporate scene graphs either demand a user-provided semantic graph, which is generally inconvenient and restrictive, or rely on ground-truth relationship annotations, limiting their capacity to capture more varied object interactions. To address these challenges, we introduce GeoSceneGraph, a method that synthesizes 3D scenes from text prompts by leveraging the graph structure and geometric symmetries of 3D scenes, without relying on predefined relationship classes. Despite not using ground-truth relationships, GeoSceneGraph achieves performance comparable to methods that do. Our model is built on Equivariant Graph Neural Networks (EGNNs), but existing EGNN approaches are typically limited to low-dimensional conditioning and are not designed to handle complex modalities such as text. We propose a simple and effective strategy for conditioning EGNNs on text features, and we validate our design through ablation studies. 12:30pm - 12:45pm
A Multi-Dimensional Digital Twin Framework for the Low-Altitude Economy China University of Geosciences (Beijing), China, People's Republic of The Low-Altitude Economy (LAE), driven by the widespread deployment of UAVs and eVTOL aircraft, demands a high-fidelity Digital Twin that extends far beyond static geographic representation. This study presents a critical review of 39 peer-reviewed papers to propose a three-layer mapping framework — Geospatial Infrastructure Layer, Environmental Sensing Layer, and Interaction Layer — and evaluates the Technology Readiness Level (TRL) of each sub-domain. The Geospatial Infrastructure Layer encompasses terrain models, ground facilities, airspace structures, and semantic navigation landmarks. The Environmental Sensing Layer covers electromagnetic modeling, target sensing and countermeasure, and micro-meteorological mapping. The Interaction Layer addresses network trust, data security, swarm coordination, and platform reliability. Our TRL assessment reveals that Environmental Sensing is the most mature layer (mean TRL 4.4, 8 field-validated papers), while cross-layer integration remains the weakest link (mean TRL 3.5, zero field-validated demonstrations). We identify standardization of low-altitude spatial data products, AI-enabled predictive mapping, crowdsourced Digital Twin updating, and closed-loop cross-layer integration as the four priority research directions. 12:45pm - 1:00pm
Lightweight indoor Pedestrian Localization via multi-step State-extended Fusion of Wi-Fi weighted Fingerprinting and PDR South China University of Technology, China, People's Republic of Indoor pedestrian localization on smartphones requires a lightweight yet robust fusion framework capable of handling unstable wireless signals and drift-prone inertial motion. In this work, we propose an efficient positioning system that integrates weighted Wi-Fi fingerprinting with Pedestrian Dead Reckoning (PDR) through a Multi-Step Extended Kalman Filter (EKF). Unlike conventional single-step filtering, the EKF employed here is formulated from the perspective of factor-graph–based probabilistic estimation, where Wi-Fi observations and PDR increments naturally act as complementary measurement and motion factors. Building on this unified view, the proposed Multi-Step EKF retains several consecutive states within its estimation window, effectively approximating short-horizon smoothing while maintaining the computational footprint required for real-time execution on consumer smartphones. To enhance observation stability, an inverse-distance weighted fingerprinting module mitigates RSSI fluctuations and gracefully handles missing values. Meanwhile, PDR inputs are refined through diagnostic analysis and incorporated as motion constraints within the fusion process. Global optimization of noise parameters is performed via dual annealing, further improving the reliability of state updates. Experiments conducted on an open indoor dataset demonstrate that the proposed method achieves a reduction of approximately 31% in mean localization error compared with a standard single-step EKF baseline. The results confirm that enforcing short-term temporal consistency through a multi-step state representation significantly suppresses drift accumulation and enhances robustness under dynamic indoor environments. Overall, the proposed framework offers a theoretically grounded, computationally efficient, and practically deployable solution for smartphone-based indoor positioning. | ||

