Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Daily Overview | |
|
Location: 716B 175 theatre |
| Date: Saturday, 04-July-2026 | |
| 8:30am - 5:00pm | TuT9: Open Web-GIS for Disaster Response and Campus Routing: From Architecture to Deployment Location: 716B |
| Date: Sunday, 05-July-2026 | |
| 8:30am - 12:00pm | TuT17: Towards Geospatial Embeddings: Investigating Accurate and Accessible Deep Geospatial Feature Representations Location: 716B |
| 12:00pm - 1:15pm | ThS10: Resilient Localization, Mapping, and Perception in Adverse Conditions using Modern Civilian Radars Location: 716B |
|
|
12:00pm - 12:15pm
Radar-centric sensor fusion for robust indoor SLAM in complex environments Wuhan University, China, People's Republic of This paper presents a radar-centric multi-sensor fusion framework, RLIO, designed for robust indoor SLAM in perceptually challenging environments such as underground garages and smoke-filled areas. Unlike conventional LiDAR-based methods that degrade under poor visibility, RLIO tightly integrates 4D imaging radar, 3D LiDAR, and an IMU within an iterated extended Kalman filter. The system introduces three key modules: a motion-prior-driven radar velocimetry algorithm for stable velocity estimation, a velocity-prior-enhanced scan-to-map registration for drift reduction in degenerate geometries, and an adaptive fusion strategy that dynamically adjusts sensor weights based on real-time degradation detection. Experimental results from both handheld and UGV platforms demonstrate that RLIO achieves accurate localization and high-quality mapping even when LiDAR performance deteriorates due to smoke or repetitive structures, highlighting its potential for reliable all-weather autonomous navigation and mapping in complex indoor and outdoor environments. 12:15pm - 12:30pm
RAMBA: 4D radar mapping by bundle adjustment Wuhan University, China, People's Republic of 4D radars have attracted increasing interest for robotic perception because they remain effective in adverse conditions such as darkness, dust, smoke, rain, and fog. Compared with conventional automotive radars that mainly provide planar coordinates and relative Doppler velocity, modern 4D radars also sense elevation, which makes them more suitable for geometric odometry and mapping. In this paper, we propose RAMBA, an offline 4D radar mapping framework based on bundle adjustment. Starting from initial poses and radar frames produced by a radar--inertial odometry front-end, we refine the radar frame states to improve global mapping consistency, measured by covariance-weighted point-to-point distances. In essence, our method extends pairwise generalized iterative closest point (GICP) to the multi-frame setting. Candidate correspondences are formed within voxels of a voxel grid built from all selected frames, and each residual is weighted by the sum of the two point covariances. The geometric constraints are jointly optimized with IMU preintegration and radar ego-velocity constraints. To reduce false associations caused by drift and revisits, RAMBA enforces temporal consistency when forming correspondences and explicitly allows constraints around loop closures. We evaluate the method on the ColoRadar and SNAIL Radar datasets. The proposed refinement consistently improves map quality and usually improves trajectory accuracy over the initial radar--inertial odometry and pose graph optimization. To the best of our knowledge, this is the first geometric offline bundle-adjustment framework for consistent 4D radar mapping. 12:30pm - 12:45pm
Deep point matching for 4D radar odometry 1Dept. of Electrical and Computer Engineering, National University of Singapore; 2State Key Lab of Info Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 129 Luoyu Road, Hongshan District, Wuhan, Hubei Four-dimensional (4D) imaging radar offers robustness in adverse weather and lighting, but its point clouds remain sparse, noisy, and affected by ghost reflections, making geometric scan matching unstable. This work integrates two existing deep correspondence models—Radar Transformer and RPM-Net—into a radar–inertial odometry pipeline without retraining. Both networks run asynchronously in dedicated ROS nodes: radar and submap point clouds are cropped, transformed, and sent to the matchers, which return either hard or soft correspondences for the first IEKF iteration of each frame. When neural outputs are delayed, the system automatically falls back to geometric matching. Returned matches are fused with a voxelized IEKF backend that computes Mahalanobis-weighted residuals. RPM-Net further supplies soft targets and confidence weights, enhancing point-to-point constraints. Experiments on ColoRadar indoor and outdoor sequences show that learning-based correspondences can reduce drift in weakly structured scenes while maintaining robustness when geometry is reliable. 12:45pm - 1:00pm
LiDAR–Radar–IMU fusion for multi-robot SLAM in adverse environments Wuhan University, State Key Laboratory of Information Engineering in Surveying Mapping and Remote Sensing This paper presents MSMR SLAM, a multi sensor and multi robot SLAM framework that integrates LiDAR, 4D radar, and IMU to achieve robust localization and consistent mapping in large scale and degraded environments. The front end includes radar inertial odometry and LiDAR inertial odometry, and evaluates the reliability of LiDAR observations using point cloud sparsity and effective range. An adaptive fusion module combines the two odometry estimates to maintain stable state estimation, while radar assisted dynamic point removal improves the reliability of geometric constraints. The back end constructs a unified factor graph that incorporates multi sensor odometry constraints, loop closure factors, and inter robot association factors. A LiDAR centered and radar assisted matching strategy enhances cross robot data association, and radar based loop closures improve global consistency when LiDAR measurements degrade. The system maintains a dense LiDAR map together with a complementary radar map, enabling hybrid mapping that remains reliable in perceptually challenging regions. Experiments on campus datasets, including smoke filled scenarios, demonstrate that MSMR SLAM achieves high precision multi robot localization and globally consistent mapping. Compared with single robot baselines, the proposed framework provides improved accuracy and robustness, and the integration of LiDAR and radar yields more complete and stable map reconstruction in complex environments. 1:00pm - 1:15pm
Heuristic-Guided Extrinsic Calibration for 4D Radar-Camera Systems Using Dynamic Objects Wuhan University, China, People's Republic of 1. We introduce a novel framework for 4D radar-camera extrinsic calibration that utilizes commonly available dynamic objects as natural correspondences. 2. We develop a heuristic-guided strategy to reliably associate radar points with image detections and estimate the extrinsic parameters without deep learning. |
| 1:30pm - 2:45pm | SpS2: ISO Data Quality Measures Register and the ISPRS Community Location: 716B |
|
|
1:30pm - 1:45pm
From Data Standards to GeoAI Governance: Strengthening Data Quality and Trust in the Next Era of Geospatial Intelligence LunateAI, United States of America We present an approach to ensure trustworthy, high-quality GeoAI which requires a coordinated effort across academia, industry, government, and standards bodies. The ISPRS community, in partnership with organizations such as ISO, OGC, the World Geospatial Industry Council (WGIC), the International Society for Digital Earth (ISDE), and other international initiatives is uniquely positioned to host this dialog. By aligning emerging GeoAI practices with established data-quality standards and ethical-AI frameworks, the community can help shape a future-proof foundation for responsible innovation in geospatial intelligence. 1:45pm - 2:00pm
Adding Data Quality and Licensing Aspects to Open Science Workflows 1Open Geospatial Consortium; 2Curtin University, Australia This paper presents research on integrating Data Quality and Licensing metadata into Open Science workflows using ISO and OGC standards and machine-readable profiles to enhance interoperability, transparency, and reusability of scientific data. All if this is possible via ongoing development of modular building blocks, validation frameworks, and engagement with standards bodies to support FAIR principles and scalable data reuse across domains. Our approach demonstrates the possible integration of concept schemes and measures defined in the ISO 19157 multipart standard. 2:00pm - 2:15pm
ISO 19157-3 Data quality measures register for geographic information: What is it, what can we do with it and why is it benefitial for the ISPRS and wider geocommunity? 1Curtin University, Australia; 2Lamtmateriet, The Swedish mapping, cadastral and land registration authority; 3Open Geospatial Consortium This paper presents the ISO 19157-3 Data quality measures register, discusses its design and implementation and illustrates its the utility to the ISPRS and wider geocommunity. In this paper we highlight the importance of providing geographic metadata about quality, the evolution of international standards to support this, and a novel implementation of a human readable and machine-actionable web register for geographic data quality measures. 2:15pm - 2:30pm
Investigating the Role of Post-Quantum Cryptography in Enhancing Blockchain-Based Geospatial Data Exchange Hochschule für Technik Stuttgart, Germany The rapid growth of geospatial data, fueled by advancements in satellite imagery, IoT sensors, and mobile services, presents significant opportunities in sectors like urban planning and environmental monitoring. However, these data are also vulnerable to cyber threats, emphasizing the need for strong protection mechanisms. This paper introduces a modular, hybrid architecture that addresses security challenges by integrating post-quantum cryptography, decentralized storage, and access control via Blockchain. It employs AES-GCM for the secure encryption of large datasets and Kyber for enhanced key protection against quantum threats. Encrypted data is stored securely in the Interplanetary File System (IPFS), with access managed by smart contracts on a private Ethereum blockchain. The architecture utilizes FastAPI for back-end processes, microservices for cryptographic services, and React for the user interface. Performance assessments show good scalability and resilience, paving the way for secure geospatial data sharing while harmonizing data sovereignty, quantum security, and decentralized management. 2:30pm - 2:45pm
Benchmarking the Quality of High-Resolution Global Land Cover Products: Toward a Shared Framework for Assessment 1Politecnico di Milano, Italy, Department of Civil and Environmental Engineering; 2Moganshan Geospatial Information Laboratory, Zhejiang Province, China High-Resolution Global Land Cover (HRLC) products are essential for monitoring Earth’s surface dynamics and supporting policy frameworks like the Sustainable Development Goals. Recent global products such as ESA WorldCover, ESRI LULC, FROM-GLC, and Dynamic World offer 10–30 m resolution maps, but their interoperability remains limited due to differences in input data, class legends, and validation protocols. This lack of harmonization hampers cross-comparison and integrated use for environmental monitoring. Although advances in remote sensing, AI, and cloud computing have enabled more frequent and detailed mapping, they have also introduced new challenges for ensuring data consistency and comparability. Validation of HRLC products is hindered by the absence of a common benchmark dataset, as current accuracy metrics are derived from heterogeneous reference samples and class definitions. Traditional validation methods are costly and time consuming, while temporal inconsistencies and cloud contamination further increase uncertainty. ISO 19157-3 offers a standardized framework to describe and automate quality measures such as positional accuracy and thematic correctness, supporting transparent and reproducible evaluation across datasets. A sustainable solution involves establishing an international benchmarking framework with standardized reference data, legends, and sampling strategies. As a practical interim approach, the Map of Land Cover Agreement (MOLCA) combines multiple HRLC products to identify spatial consensus and disagreement, offering a proxy for thematic reliability. Although MOLCA measures consistency rather than absolute accuracy, its integration into ISO 19157-3 would advance data quality assessment, fostering transparency, interoperability, and confidence in HRLC-derived environmental analyses. |
| Date: Monday, 06-July-2026 | |
| 8:30am - 10:00am | ThS14: AI-Augmented Photogrammetry - Bridging Learning-based Approaches and Classical Geometric-based 3D Methods Location: 716B |
|
|
8:30am - 8:45am
Combining Photogrammetry and Gaussian Splatting 13D Optical Metrology (3DOM) unit, Bruno Kessler Foundation (FBK), Trento, Italy; 2Department of Civil, Environmental and Geodetic Engineering, The Ohio State University, USA Among the image-based methods, traditional photogrammetry is a consolidated 3D reconstruction technique able to provide highly accurate metric products, widely exploited in many domains for documentation and mapping purposes. The reconstruction capability of this technique is, however, conditioned by the characteristics of the captured scene, with high performance in well-textured areas and limits when non-collaborative surfaces, such as reflective or transparent objects, are present. In such cases, the photogrammetric reconstruction is often affected by noise, incomplete geometry and artifacts, reducing its final reconstruction quality. In recent years, different AI-based reconstruction methods have emerged as alternative (or complementary) 3D reconstruction and rendering solutions. In particular, 3D Gaussian Splatting (GS) has demonstrated impressive capabilities in rendering photorealistic scenes in challenging situations with high visual fidelity. However, its application in large-scale scenarios or when highly accurate 3D metric products are required is still limited, due to the hight computational resources needed and the intrinsic optimization of GS methods for photometric rendering quality. To address these bottlenecks, this work proposes a hybrid reconstruction pipeline, leveraging the strengths and benefits of each technique. The method exploits the accurate geometry of photogrammetry in well-textured regions and the higher GS capabilities to improve completeness and visual aspect in areas featuring non-collaborative surfaces. A fusion strategy is proposed to combine the two products into a single 3D model, presenting results on two aerial and one terrestrial dataset. 8:45am - 9:00am
Refraction-Aware Two-Media NeRF for Underwater 3D Reconstruction 1Department of Geodesy and Geoinformation, TU Wien, Vienna, Austria; 2Institute of Photogrammetry and Remote Sensing, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany; 3Unit of Geometry and Surveying, University of Innsbruck, Innsbruck, Austria Neural Radiance Fields (NeRFs) (Mildenhall et al., 2020) have revolutionized novel view synthesis, but standard formulations assume straight rays in a single, homogeneous medium. In underwater scenarios, refraction at the air–water interface leads to bent light paths and, if ignored, to distorted 3D structure and missing underwater points. Refraction-aware NeRF variants such as NeRFrac (Xue et al.,2022) demonstrate the benefit of modeling refraction, but are limited to a single underwater medium and standalone implementations. Recent work has applied NeRFrac to through-water reconstruction (Brezovsky et al., 2025), introduced a simulation framework for two-media scenes (Schulte et al., 2025). Building on these ideas, we introduce the general concept of a twomedia NeRF and demonstrate its integration into the Nerfstudio framework (Tancik et al., 2023) with the goal of extracting metrically meaningful underwater point clouds rather than only improving image-based metrics. 9:00am - 9:15am
CENS: A Coverage-efficient Pixel Sampling Strategy for enhancing NeRF-generated Point Cloud Fidelity Unit of Geometry and Surveying, Universität Innsbruck, Austria Many geospatial workflows critically depend on high-fidelity 3D point clouds for applications such as change detection, orthophoto generation, and modeling. However, NeRF-generated point clouds often suffer from sampling inefficiencies inherent in the predominant random pixel sampling approach. We identify spatial redundancy as one such inefficiency: random sampling has the inevitable consequence of sampling large, low-texture patches more frequently than detailed, high-frequency textured regions. As a result, low-texture areas turn to be oversampled and other pixels remain unsampled -- regardless of their importance to the reconstruction task. To overcome this, we propose CENS (Coverage-Efficient Non-Redundant Sampling), a deterministic pixel sampling strategy that maximizes spatial coverage, eliminates intra-image sample repetition, and ensures reproducibility via structured initialization. Evaluated on the Jamtal valley dataset, CENS achieves comparable geometric accuracy (C2M: mean = -0.0027 vs. -0.0011 m; standard deviation = 0.027 vs. 0.028 m) using 50% fewer training steps (11,232 vs. 22,464), while yielding 28.2% more points, higher orthophoto fidelity, and improved point cloud completeness. Beyond CENS, we also explored NeRFs for ALS point cloud simulation, achieving realistic occlusion patterns and accuracy within UAV photogrammetry standards (Vertical RMSE} = 24 mm; Horizontal RMSE = 17 mm). Crucially, CENS positions NeRFs as a scalable, practical solution for geospatial point cloud and orthophoto generation, advancing them toward real-world mapping workflows, and integrates seamlessly into NeRFStudio. 9:15am - 9:30am
Explicit Reconstruction of thermal Environments based on dual-modal neural Radiation Fields for diagnosing Building Facade Defects 1School of Urban Design, Wuhan University, Wuhan, China, China, People's Republic of; 2State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, Wuhan, China This research presents an innovative multi-modal framework for the explicit 3D reconstruction of building thermal environments to diagnose facade defects. We propose a framework which is centered on a dual-branch Neural Radiance Field (NeRF) architecture, which effectively fuses fine-grained geometric information from RGB data with precise quantitative thermal data from TIR data. For practical diagnostics, the framework integrates the Signed Distance Function (SDF) to implicitly learn a high-fidelity surface representation. Subsequently, a final, explicit triangular mesh is extracted from this implicit field using the Marching Cubes algorithm. The resulting model achieves geometric accuracy and thermal fidelity, enabling the clear visualization, localization, and analysis of thermal anomalies such as thermal bridges, cavities, and moisture ingress in their correct spatial context. 9:30am - 9:45am
Assessing the Reconstruction Potential of 3D Vision Foundation Models for Oblique Photogrammetry 1Faculty of Geosciences and Engineering, Southwest Jiaotong University, 611756 Chengdu, China; 2CRSC Communication & Information Group Co., Ltd.; 3Yunnan Engineering Research Center of 3D Real Scene, Kunming 650500, China; 4Kunming Engineering Corporation Limited, Kunming 650500, China 3D vision foundation models, which directly regress 3D geometry from 2D images in an end-to-end manner, have recently attracted growing attention in the computer vision community. However, their potential for oblique 3D reconstruction has not been systematically evaluated. To this end, we establish an automated evaluation pipeline to benchmark these models on oblique imagery. Our experiments reveal that: benefiting from the powerful zero-shot generalization, 3D vision foundation models can robustly estimate camera parameters and generate dense point clouds under sparse-view and low-overlap conditions, with some rivaling traditional photogrammetry configured with redundant observations. Counterintuitively, two-view reasoning foundation models employing explicit PnP-RANSAC for global alignment consistently outperform multi-view reasoning foundation models inferring multi-view relationships via implicit attention mechanism when processing more than 2 views. Notably, incorporating known camera parameters as conditioning inputs, which act as weak supervision rather than rigid geometric constraints, yields only marginal accuracy improvements. Based on ViT architecture, these foundation models face scalability bottlenecks to large-scale and high-resolution oblique imagery, and their prevalent ideal pinhole camera assumption still makes explicit distortion correction an unavoidable preprocessing step. 9:45am - 10:00am
Evaluating the Performance of 3D Vision Foundation Models for DSM Reconstruction from Satellite Images 1Faculty of Geosciences and Engineering, Southwest Jiaotong University, Chengdu 611756, Sichuan, China; 2Department of Military Oceanography and Hydrography and Cartography, Dalian Naval Academy, Dalian 116018, China; 3Key Laboratory of Hydrographic Surveying and Mapping of PLA, Dalian Naval Academy, Dalian 116018, China; 4Institute of Remote Sensing Satelite, China Academy of Space Technology, Beijing 100094, China Three-dimensional (3D) reconstruction from satellite imagery is a critical research topic in the fields of remote sensing and geoinformation science. Although 3D Vision Foundation Models (3D VFMs) have demonstrated remarkable performance in reconstructing natural scenes, their capability to handle high-resolution satellite imagery has not been systematically evaluated. This study presents a comprehensive assessment of seven representative 3D VFMs for satellite-based 3D reconstruction and integrates four point-cloud alignment strategies. Rigorous comparisons were conducted against high-precision LiDAR-derived Digital Surface Models (DSMs) using two publicly available multi-view satellite datasets--WHU-TLC and MVS3D. Experimental results show that, on the high-resolution MVS3D dataset, the Depth Anything v2 (DAV2) model combined with the Affine alignment strategy achieved the best overall performance, producing DSMs with a Mean Absolute Error (MAE) of 1.75 m and a Root Mean Square Error (RMSE) of 3.24 m, corresponding to accuracy improvements of 8.4 % and 13.6 %, respectively--significantly outperforming all other model-strategy combinations. In contrast, on the lower-resolution WHU-TLC dataset, all 3D VFMs exhibited notable performance degradation, and the reconstructed results showed limited practical value, revealing persistent generalization challenges for current models in low-resolution scenarios. Overall, this study systematically quantifies the performance of 3D VFMs in satellite image-based 3D reconstruction, confirming their strong potential for high-resolution satellite applications and providing valuable insights for enhancing model robustness and generalization across complex urban and low-resolution environments. |
| 1:30pm - 3:00pm | Forum1A: Observing the Earth as One: Making space for everyone in Remote Sensing, Photogrammetry, and Spatial Information Science Location: 716B |
| 3:30pm - 5:15pm | Forum1B: Observing the Earth as One: Making space for everyone in Remote Sensing, Photogrammetry, and Spatial Information Science Location: 716B |
| Date: Tuesday, 07-July-2026 | |
| 8:30am - 10:00am | Forum2A: The Future of Space- based Earth Observation Location: 716B |
| 1:30pm - 3:00pm | Forum2B: The Future of Space- based Earth Observation Location: 716B |
| 3:30pm - 5:15pm | Forum2C: The Future of Space- based Earth Observation Location: 716B |
| Date: Wednesday, 08-July-2026 | |
| 8:30am - 10:00am | ThS5: Large Language Models for Intelligent LiDAR Point Cloud Processing Location: 716B |
|
|
8:30am - 8:45am
GeoOpen3D: Geometry-guided training-free open-vocabulary 3D segmentation via visual foundation models 1The Hong Kong University of Science and Technology (Guangzhou), China; 2School of Computer and Communication Engineering, Northeastern University, China Open-vocabulary 3D segmentation offers an attractive alternative to closed-set scene parsing, yet directly transferring 2D vision-language models to outdoor point clouds remains difficult because projection disrupts geometric continuity and sparse sampling weakens mask quality. This paper presents GeoOpen3D, a geometry-guided and training-free framework for open-vocabulary 3D point cloud segmentation. GeoOpen3D constructs a geometry-preserving RGB-D representation through projection, super-sampling, and depth enhancement to improve alignment between 3D structure and 2D foundation models. It then combines GroundingDINO for language-driven proposal generation with SAM for mask extraction, while introducing depth-aware regularisation to favour structurally coherent regions and clearer boundaries. The selected masks are back-projected to the original point cloud through pixel-to-point correspondence, yielding point-wise semantic labels without any 3D model training. Experiments on the SensatUrban dataset show that GeoOpen3D achieves 42.1\% mIoU, including 98.5\% IoU for buildings and 97.3\% IoU for vegetation, outperforming existing training-free open-vocabulary baselines. Additional experiments on a custom island dataset further demonstrate promising transferability to unseen categories. These results indicate that geometry-guided 2D-to-3D transfer provides an effective and scalable path towards open-vocabulary understanding of large-scale outdoor scenes. 8:45am - 9:00am
SPARC: Scalable 3D Panoptic Segmentation with Reinforcement-driven Clustering Sun Yat-sen University, China, People's Republic of Large-scale 3D panoptic segmentation is critical for digital twins and geospatial analysis, demanding models that process massive point clouds while distinguishing instances across highly diverse spatial scales. However, prevailing graph-based approaches rely on one-shot optimization, suffering from \textit{short-sighted decisions} where irreversible local errors propagate globally, leading to severe under-segmentation at boundaries between objects of disparate scales. To overcome this short-sightedness, we present \textbf{SPARC}, a scalable framework that reframes graph clustering as a sequential, self-correcting decision process driven by hierarchical reinforcement learning. Specifically, SPARC employs a dual-level agent where a meta-controller adaptively determines instance completeness while a low-level policy iteratively refines edge affinities, enabling the model to revise early mistakes based on long-horizon rewards rather than greedy local cues. Complementing this, we introduce Semantic Voxel Partitioning (SVP) to generate semantically coherent superpoints, ensuring robust primitives that mitigate noise before clustering begins. Extensive experiments demonstrate that SPARC achieves state-of-the-art performance on the DALES dataset with a Panoptic Quality of 62.4\%, surpassing existing methods by 9.8\% and effectively resolving multi-scale segmentation ambiguities. 9:00am - 9:15am
LaSA-Net: A Language-Guided Network for Large-Scale 3D Referring Expression Segmentation on the UrbanRefer Benchmark 1Key Laboratory of Geographic Information Science (Ministry of Education), East China Normal University, Shanghai 200241, China; 2Key Laboratory of Spatial-temporal Big Data Analysis and Application of Natural Resources in Megacities, Ministry of Natural Resources, East China Normal University, Shanghai 200241, China; 3School of Geospatial Artificial Intelligence, East China Normal University, Shanghai 200241, China; 4Hinton STAI Institute, East China Normal University, Shanghai 200241, China; 5School of Remote Sensing and Geomatics Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China 3D Referring Expression Segmentation (3DRES) aims to segment point cloud scenes based on a given expression. However, existing 3DRES methods face three main challenges: (1) significant progress has been made in indoor scenes, yet large-scale and complex outdoor scenes, captured by airborne or mobile LiDAR, remain fully unexplored; (2) traditional methods often suffer from inefficiency and mis-segmentation due to insufficient attention to the spatial information of instances during query generation; and (3) existing models treat all queries equally in the decoder and predict the final mask in one step, which is inefficient in outdoor road scenes dominated by background point clouds, where objects are sparse and small. To address these challenges, a new outdoor 3DRES benchmark, named UrbanRefer, is introduced. The dataset consists of 100 large-scale outdoor scenes and 1,100 specially designed long textual descriptions, emphasizing geospatial relationships and multi-object contexts unique to outdoor environments. Additionally, the Language-guided Spatial Anchoring Network (LaSA-Net) is proposed for the directional segmentation task in outdoor scenes. Specifically, the Local-Global Aggregation (LGA) module is incorporated into the backbone to enhance local and global context awareness, effectively optimizing point features. Furthermore, a Text-driven Localization (TL) module is introduced, which directly predicts the 3D positions of all entities mentioned in the text, providing robust spatial priors for the decoder. Finally, a Hierarchical Prompt-aware Decoder (HPAD) is designed to locate rough regions by extracting task-driven signals from the interaction between expressions and visual features. Extensive experiments demonstrate that the mIoU metric of LaSA-Net outperforms state-of-the-art methods by 0.9%. 9:15am - 9:30am
Scenereasoner: Decoupled Spatial Tokenization for large-scene understanding with llms Shenzhen University, Shenzhen, Guangdong, People's Republic of China Most existing 3D vision-language models focus on object-level or single-room understanding and perform poorly in large-scale, multi-room indoor environments where task-relevant objects constitute only a small fraction of the total point cloud. When multi- room point clouds are fed directly into an LLM, critical semantic signals are diluted by the vast amount of redundant background, making it difficult for the model to focus on truly relevant regions. We propose SceneReasoner, a decoupled spatial tokenisation framework that addresses this challenge through three core designs: (1) pre-tokenisation text-guided feature weighting that leverages the shared CLIP embedding space between OpenScene point features and text queries to amplify question-relevant point features before any spatial compression occurs; (2) 2D–3D feature fusion that integrates top-down 2D CLIP features with 3D sparse tokens, supplying the model with appearance semantics—such as texture, material, and room layout—absent from raw point clouds; and (3) layer-wise dense feature injection that inserts local dense features into the LLM attention mechanism layer by layer for fine- grained perception of key regions. We evaluate on the XR-Scene benchmark, which covers cross-room question answering and scene captioning over HM3D indoor environments with an average area of 132 m2. SceneReasoner achieves the best CIDEr on XR-SceneCaption (+0.33 over LSceneLLM), the highest METEOR on XR-QA, and competitive ROUGE-L across all three tasks, demonstrating the effectiveness of task-guided spatial tokenisation for large-scene understanding. 9:30am - 9:45am
Llm-Supervised Point Cloud Processing: from Unsupervised 3D Scene-Graph Generation to Interactive Scene Manipulation 13D Geodata Academy, France; 2Geoscity Lab, University of Liège, Belgium; 3Panoriq AI, Germany Understanding and manipulating 3D spatial environments remains a fundamental challenge in geospatial sciences, with applications spanning digital twins, facility management, urban planning, and autonomous systems. While point cloud acquisition technologies have matured significantly, the semantic interpretation and interactive manipulation of captured 3D scenes continue to require extensive manual intervention and domain expertise. This paper presents a novel LLM framework that bridges unsupervised graph-based 3D scene understanding with natural language-driven interactive manipulation, enabling context-aware spatial intelligence at scale. 9:45am - 10:00am
Multimodal Large Language Models to road inventory with non-photorealistic Point Cloud visualization CINTECX, Universidade de Vigo, GeoTECH, 36310, Vigo, Spain Accurate road inventories are crucial for maintenance, safety, and resource allocation, with automation improving efficiency but often lacking user-friendly human-machine interaction. This paper evaluates how non-photorealistic rendering of 3D point clouds impacts Multimodal Large Language Models (MLLMs) interpretation for road inventory, testing three methods on real road data in Santarém, Portugal. From 3D point clouds coloured with RGB information, non-photorealistic techniques are implemented and compared: Ambient Occlusion (AO), Eye-Dome Lighting (EDL) and Multi Feature-Rich Synthetic Color (MFRSC). Several state-of-the-art MLLMs are also tested: GPT5, Gemini2.5-Pro, Gemini2.5-Flash, CogVLM2, MiniCPM-V, Llama4-scout-17b, Mistral-Small3.2, Qwen2.5vl and Gemma3. The results indicate that non-photorealistic techniques do not hinder the identification of road elements by MLLMs, indicating their potential for 3D point cloud classification tasks even when true RGB colour is not available. Furthermore, the overall performance metrics, with F-scores over 80% for proprietary, state-of-the-art models (GPT5, Sonnet 4.5 and Gemini) show that 2D captures of 3D point clouds can be a suitable data source for zero-shot object classification. |
| 1:30pm - 3:00pm | Forum3A: Legacy Project: How to Secure Funding to Support Geospatial Activities Location: 716B |
| 3:30pm - 5:15pm | Forum3B: Legacy Project: How to Secure Funding to Support Geospatial Activities Location: 716B |
| Date: Thursday, 09-July-2026 | |
| 8:30am - 10:00am | Forum4A: Hybrid Intelligent Geospatial Computing Location: 716B |
| 1:30pm - 3:00pm | Forum4B: Hybrid Intelligent Geospatial Computing Location: 716B |
| 3:30pm - 5:15pm | Forum4C: Hybrid Intelligent Geospatial Computing Location: 716B |

