JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at isprs2026@icsevents.com.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Daily Overview

Session

ThS3: Spatial Intelligence in the Wild

Time:

Friday, 10-July-2026:

1:30pm - 3:00pm

Location: 714B

175 theatre

Session Topics:

Spatial Intelligence in the Wild (ThS3)

Presentations

1:30pm - 1:45pm

Proactive cognitive map for embodied spatial reasoning

Wenbin Wang, Zhifeng Gu, Guangchi Fang, Bing Wang

The Hong Kong Polytechnic University

This work addresses the emerging challenge of achieving proactive spatial cognition for embodied and spatial AI systems operating in dynamic real-world environments. Conventional mapping and reasoning approaches are largely passive and task-dependent, limiting their ability to build persistent understanding beyond immediate goals. We introduce the Proactive Cognitive Map (PCM), a unified framework that enables agents to autonomously construct, verify, and refine their spatial knowledge through continual perception, self-questioning, and mental simulation. PCM integrates a grid-based perceptual map with a semantic, object-centric memory, forming an explicit and interpretable representation of the environment. A self-questioning module identifies uncertain or ambiguous regions and generates targeted queries, while a simulation module emulates human imagination to perform counterfactual reasoning and lightweight geometric self-verification across time and viewpoints.

We evaluate PCM across episodic-memory embodied QA tasks and the long-horizon, multi-task benchmarks, GOAT-Bench, covering episodic reasoning, continual understanding, and cross-task generalization. Results show that PCM’s self-driven graph construction and proactive refinement outperform goal-specific exploration methods.

By transforming mapping from static perception into a continual cognitive process of questioning, imagining, and verifying, this study provides a step toward lifelong, interpretable, and self-improving spatial intelligence.

1:45pm - 2:00pm

Automatic Update and 3D Gaussian Reconstruction of Building Facade using Multi-Sensor Unmanned Aerial and Ground Vehicles: An Air-Ground Fusion Approach

Jing Li^1,2, Shicheng Xu¹, Banghui Yang^1,3, Shuhao Sun⁴

¹Aerospace Information Research Institute，Chinese Academy of Sciences, Macau S.A.R. (China); ²International Research Center of Big Data for Sustainable Development Goals, China; ³University of Chinese Academy of Sciences, Beijing 101408, China; ⁴Tianjin Chengjian University, Tianjin, China

As a spatial digital foundation for digital twins and smart cities, the timeliness and accuracy of realistic 3D models are of critical importance. Intelligent and automated data acquisition and update workflows form the core infrastructure that sustains this digital foundation. Current modeling techniques relying on a single data source face inherent limitations: UAV(Unmanned aerial vehicle)-based oblique photogrammetry struggles to capture lower facade details, often leading to geometric distortions and blurred textures, while conventional terrestrial surveying methods suffer from low efficiency and limited automation as well as intelligence. Moreover, the substantial viewpoint differences between aerial and ground data hinder effective fusion. However, recent technological advances in 3D Gaussian Splatting (3DGS), large vision model, multi-sensor SLAM and robotic systems, open up new opportunities to significantly improve the fidelity, efficiency, completeness and automation of 3D reconstruction through the cooperation of UGVs and UAVs.To address the current challenges from 3D reconstruction, this study proposes a novel framework which seamlessly integrates autonomous unmanned systems, state-of-the-art large visual models, multi-sensor SLAM (simultaneous localization and mapping) and cutting-edge 3D Gaussian rendering technology. The framework realizes an integrated workflow for automatic updating building facade and high-fidelity 3D GS rendering using air to ground fusion algorithms with autonomous systems. The primary focus is to advance the automation and intelligence of building 3D reconstruction, thereby enabling efficient updates of urban 3D models.

2:00pm - 2:15pm

Monocular 3D Reconstruction for Martian Terrain Based on Diffusion Model

Jiarui Cao¹, Rong Huang^1,2, Yusheng Xu^1,2, Zhen Ye^1,2, Xiaohua Tong^1,2

¹College of Surveying and Geoinformatics, Tongji University, Shanghai, China; ²The Shanghai Key Laboratory of Space Mapping and Remote Sensing for Planetary Exploration, Shanghai, China

High-precision digital terrain models (DTMs) are important for Mars explorations and research. However, traditional terrain reconstruction methods suffer from limitations in coverage and resolution. To enhance the model's ability to recover fine-grained topography, we present a diffusion-based monocular terrain reconstruction method, which progressively recovers Martian terrains from single-view high-resolution optical images. We employed a multi-scale U-Net denoising network with attention mechanisms and introduced an additional end-to-end depth constraint. To improve terrain reconstruction efficiency, we implemented a diffusion model in the latent space and adopted a skipping sampling mechanism. We employed the proposed method to reconstruct terrain in different regions. Experimental results demonstrate that the reconstructed terrain achieves an accuracy of 2 m. Furthermore, compared to photogrammetric terrain, the shaded relief generated by our method exhibits greater similarity to the input imagery.

2:15pm - 2:30pm

GESM: GMM-based Efficient Sonar Mapping

Kaicheng Zhang, Zhifeng Gu, Gaungchi Fang, Bing Wang

The Hong Kong Polytechnic University, Hong Kong S.A.R. (China)

GESM is a Gaussian-mixture sonar mapping pipeline that converts 2D imaging sonar into a continuous 3D probabilistic map for navigation. We estimate posterior occupancy with Gamma-CFAR, cluster occupied and free space along beams, encode them with weighted EM/MPPCA and moment-matched Gaussians, and incrementally merge local mixtures into a globally consistent map. Loop closure is handled by in-place edits of mixture parameters. On simulation and pool/harbour data, GESM yields dense, navigation-ready structure and free water while reducing map memory by ~99% compared with a comparable voxel grid.

2:30pm - 2:45pm

An Analysis of the Impact of Geospatial Data Sources on Mesh-Based Localisation Performance

Francesco Vultaggio^1,2, Phillipp Fanta-Jende¹, Alexander Kern², Markus Gerke²

¹Austrian Institute of Technology, Austria; ²Technical University of Braunschweig, Germany

This paper investigates how the provenance and resolution of geospatial data used to construct mesh maps affect the accuracy and robustness of mesh-based visual localisation. Mesh-based approaches offer significant advantages over traditional pipelines reliant on Structure from Motion (SfM) models, including the ability to scale to city-sized scenes---by leveraging large-scale data sources such as national mapping databases--- and on-demand generation of arbitrary synthetic views. While prior work has focused on algorithmic improvements to mesh-based localisation, none has systematically analysed how different input data affect localisation outcomes. In this work, we evaluate three meshes---derived from aerial oblique imagery, combined aerial and ground mobile mapping data, and close-range ground imagery---across the egenioussBench Extended and House of Science query sets and four image matchers. We show that mesh quality is the dominant factor governing localisation performance. In the House of Science experiments, aerial meshes lack the resolution required to resolve façade detail, causing near-total localisation failure regardless of matcher. In the egenioussBench Extended experiments, augmenting an aerial mesh with ground data yields consistent but less dramatic improvements. We further introduce the Perceptual Detail Score (PDS), a viewing-condition-aware metric that proves to be a strong predictor of downstream pose accuracy across all experimental configurations.

2:45pm - 3:00pm

JCFI: a Composite Index for RMLS-based Shield Tunnel Segment Joint Recognition

Liying Wang¹, Ze You¹, Yiwei Yu¹, Yong Feng², Chunxi Xie³

¹School of Geomatics, Liaoning Technical University, Fuxin, China; ²Division of Geoinformation Management, Department of Natural Resources of Liaoning Province, Shenyang, China; ³Institute of Surveying, Mapping and Geographic Information, China Railway Design Group Co., LTD., Tianjin, China

The accurate recognition of segment joints serves as a critical step for capturing joint anomaly information, evaluating segment assembly quality, diagnosing structural health status, and determining the loosening of connecting bolts. It holds significant importance for the operation and maintenance of shield tunnels. However, existing studies on joint recognition based on Rail-borne Mobile Laser Scanning (RMLS) suffers from insufficient comprehensiveness in feature representation, leading to notably poor accuracy and robustness under complex scenarios such as noise interference, data loss due to object occlusion, and uneven point cloud density. To address this issue, this study proposes a shield tunnel segment joint recognition method based on the Joint Composite

Feature Index (JCFI). The proposed method first employs a cross-sectional ellipse fitting approach to filter out obvious non-lining points. Subsequently, a composite index JCFI, which integrates curvature, left-right density ratio, and relative depth, is designed to

quantitatively characterize the feature differences of segment joints. Finally, based on the constructed JCFI indicator, the recognition of circumferential and longitudinal joints is sequentially achieved. Validation tests using RMLS point cloud data from the Guangzhou Metro Line 8 tunnel demonstrate that the proposed method, by constructing the JCFI that comprehensively characterizes joint features, effectively handles complex scenarios including noise interference, joint missing, and uneven point cloud density. The

joint recognition achieves a recall rate of 90.14%, a precision rate of 99.04%, and an IoU of 89.36%, providing a reliable technical solution for the accurate identification of shield tunnel segment joints.