JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at isprs2026@icsevents.com.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Daily Overview

Location: 713A
125 theatre

Date: Friday, 10-July-2026

8:30am - 10:00am

WG II/2E: Point Cloud Generation and Processing
Location: 713A

8:30am - 8:45am

Appearance-aware Scaling Diffusion Model for 3D Point Cloud Upsampling

Sunghwan Yoo, Gunho Sohn

York University, Canada

This paper introduces the Appearance-guided Scaling Diffusion Model (AGDM), a novel diffusion-based framework designed to densify sparse airborne laser scanning (ALS) point clouds while preserving fine geometric detail. Traditional diffusion models for 3D upsampling, such as LiDiff and PUDM, operate solely on intrinsic 3D information and struggle to reconstruct sharp edges and continuous surfaces when input data are extremely sparse. AGDM addresses these limitations by integrating two complementary conditional priors: multi-view appearance cues and geometry-aware 3D features.

Sparse point clouds are first rendered into ten synthetic viewpoints, and a Vision Transformer extracts high-level visual embeddings that encode surface appearance and boundary structures. In parallel, a Minkowski-based encoder processes the input geometry to capture spatial continuity and local shape characteristics. A cross-attention fusion module aligns and combines these modalities, producing a unified conditioning signal that guides a scaling diffusion network during iterative denoising.

AGDM is trained and evaluated on the YUTO dataset, where dense ground-truth scenes are reconstructed from multi-mission ALS data. Experiments demonstrate that AGDM achieves superior performance across Chamfer Distance, Jensen–Shannon Divergence, F1 score, and multi-scale IoU metrics. Qualitative results further show that the model produces more uniform, edge-preserving, and structurally coherent point clouds than existing diffusion approaches.

By leveraging appearance guidance alongside geometric priors, AGDM significantly improves the fidelity and practicality of LiDAR point-cloud upsampling, offering an effective pathway for scalable and cost-efficient 3D digital-twin generation.

8:45am - 9:00am

Scan Outlier Ratio (ScOR): LiDAR Scanning and Survey-Aware Filtering of Detached Points in Terrestrial and Permanent Laser Scanning Point Clouds

Ronald Tabernig^1,2, Bernhard Höfle^1,2

¹3DGeo Research Group, Institute of Geography, Heidelberg University, Heidelberg, Germany; ²Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Heidelberg, Germany

Accurate 3D surface reconstruction and change analysis relies on point clouds representing persistent solid surfaces and should neglect very small (< laser footprint size) and temporary objects that create outliers. Terrestrial and Permanent Laser Scanning (TLS/PLS) data often contains transient or detached points, which violate assumptions of common cloud-, mesh-, and surface-based 3D change analysis methods. Those points cause wrong correspondences and change values in multi-temporal point cloud comparison. We address this with the Scan Outlier Ratio (ScOR) filter, a LiDAR scanning and survey-aware descriptor designed to identify points unsuitable for most point cloud-based change analysis methods. ScOR compares the measured point spacing with the expected spacing, assuming the surface is locally planar and orthogonal to the incoming laser beam. ScOR works with a single scan or multiple scans acquired from the same position, enabling multi-temporal neighborhoods for filtering. Using data from natural and urban environments, we analyze ScOR across different surfaces, neighborhood sizes, temporal neighborhoods, and compare it with the Statistical Outlier Removal (SOR) algorithm. Results show that ScOR successfully removes non-surface points, while preserving surface information. In our experiments, the true positive rate exceeds 95% in all but one case, while the false positive remains below 10% throughout. With neighborhoods from subsequent and aggregated epochs, the method automatically detects and removes large temporary objects (e.g., a person). Due to its interpretability, efficiency, and range-aware design, ScOR provides an effective pre-processing method for automated and near real-time 3D surface change analysis with TLS/PLS.

9:00am - 9:15am

LiDAR-Enhanced 3D Gaussian Splatting SLAM for Planetary Rover Exploration

Lingxiao Zhang¹, Rong Huang^1,2, Yusheng Xu^1,2, Zhen Ye^1,2, Xiong Xu^1,2, Changjiang Xiao^1,2, Xiaohua Tong^1,2

¹College of Surveying and Geo-Informatics, Tongji University, Shanghai 200092, China; ²Shanghai Key Laboratory for Planetary Mapping and Remote Sensing for Deep Space Exploration, Shanghai 200092, China

Autonomous positioning and scene reconstruction are crucial to the exploration and scientific research tasks of planetary rovers. 3D Gaussian splatting (3DGS) provides a new paradigm for dense reconstruction. However, the reconstruction method that relies only on monocular images will cause scale blur and insufficient geometric consistency. These problems are more prominent in planetary scenes that lack geometric constraints and weak textures. In order to overcome these limitations, we proposed a lidar-enhanced 3DGS-SLAM pipeline. By introducing sparse lidar measurements as prior information to improve depth prediction and ensuring consistent Gaussian initialization on the physical scale. Optimize the camera poses and Gaussian parameters through differentiable rendering to achieve robust localization and photometric-geometric consistency. Experiments on the Erfoud, a planetary similarity dataset, show that our method is superior to the advanced 3DGS-based SLAM system. The ATE has reduced by more than 50%. The PSNR, SSIM, and LPIPS have all improved significantly.

9:15am - 9:30am

Sensor Domain Adaptation for 3D Object Detection via LiDAR Super-Resolution

June Moh Goo, Zichao Zeng, Jan Boehm

University College London, United Kingdom

LiDAR-based perception models’ performance can degrade sharply when applied to data from sensors different to those they were trained on. LiDAR super-resolution aims to enhance sparse point clouds from low-cost sensors. This can help to bridge the sensor domain gap to higher resolution LiDAR. Prior work has primarily focused on reconstruction quality metrics for super-resolution with limited evaluation of downstream perception tasks. We address this gap by conducting a systematic analysis of how super-resolution quality impacts 3D object detection performance. We evaluate detection capability through zero-shot transfer experiments on the KITTI object dataset. Four representative detectors (SECOND, PointPillars, PV-RCNN, PointRCNN) trained on high-resolution data are directly applied to super-resolved low-resolution data without fine-tuning. Results reveal a critical insight: reconstruction improvements yield vastly different detection gains across architectures. PointPillars shows minimal improvement until reaching high reconstruction quality, then performance improves significantly. In contrast, PV-RCNN exhibits steady gains throughout. The highest-quality reconstruction closes up to 86% of the performance gap and enables detection in safety-critical scenarios, including distant vehicles and small pedestrians, where lower-quality methods fail entirely. This work establishes that LiDAR super-resolution effectiveness depends on both reconstruction quality and detector architecture.

9:30am - 9:45am

Ray Queries On Raw Point Clouds

Balthasar Teuscher, Paul Walther, Kwasi Nyarko Poku-Agyemang, Martin Werner

Technical University of Munich, Germany; TUM School of Engineering and Design, Department of Aerospace and Geodesy, Professorship of Big Geospatial Data Management

Retrieving information from point clouds for analysis and visualization has gained ever-increasing interest. A growing niche in this regard is ray queries, commonly used for image synthesis. Ray tracing is widely used in computer graphics, with a multitude of solutions based on bounding volume hierarchies. However, these solutions are rarely straightforward to integrate with raw point cloud data and geospatial analytical workflows. To overcome this, we present a novel approach to ray tracing in raw point clouds that builds upon and extends existing geospatial indices. The solution is exemplified by a fast octree implementation that supports versatile query semantics, such as neighborhood queries with constraints on k and radius for both points and rays, while offering configurable data organization schemes, including layered, fixed, and adaptive depth. The evaluation demonstrates satisfactory speed and capabilities for many scientific use cases, while simultaneously exhibiting low implementation costs, high flexibility, and simplicity in integrating ray tracing into analytical point cloud workflows.

9:45am - 10:00am

Analysis of free large Area covering Elevation Models and improvement by ICESat-2

Karsten Jacobsen

Leibniz University Hannover, Germany

Accuracy analysis of free elevation models TDX-EDEM, AW3D30, SRTM and ASTER GDEM-3. Determination of systematic elevation model errors by Z-shift, model tilt and systematic errors as function of X and Y. Comparison with ICESat-2 data, determination of the systematic elevation model errors by ICESat-2 ATL08 data and correcting the free elevation models. Accuracy analysis of the corrected elevation models by airborne LiDAR data.

The corrections based on the ICESat-2 data significantly improved the free elevation models.

1:30pm - 3:00pm

WG III/1F: Remote Sensing Data Processing and Understanding
Location: 713A

1:30pm - 1:45pm

From Image to Perception: Scene-Graph-Driven Modeling of Human-Scale Urban Experience with Street-view Images

Haipeng Yang, Xian Guo

Beijing University of Civil Engineering and Architecture, China, People's Republic of

This study examines how street-view scenes relate to urban perception using a scene-graph-driven modeling method. Each image is parsed into subject–predicate–object triplets; entity appearance from a CNN backbone and relation semantics from a Transformer detector are fused at node level via a learnable gate. A relation-aware graph neural network performs message passing and attentive readout to predict six perception dimensions (beautiful, boring, depressing, lively, safe, wealthy). Taking Place Pulse 2.0 dataset as benchmark, we convert pairwise votes to binary labels per dimension with standard train/validation/test splits. Experiments compare the graph approach against CNN+SVM and Transformer+SVM baselines under identical protocols. Results show consistently higher accuracy across all six dimensions, with notable gains for beautiful and wealthy. Gradient and integrated-gradient analyses offer node- and edge-level attributions, highlighting elements such as trees, facades, and overhead wires. The method balances accuracy with clarity, and the results point to practical cues that can support human-centered urban design.

1:45pm - 2:00pm

Real-Time Road Condition Detection and Mapping Using YOLOv11 and Built-In Car Dashcam

Harjot Josan¹, Frank Zhang², Baoxin Hu³

¹University of the Fraser Valley (UFV), Canada; ²University of the Fraser Valley (UFV), Canada; ³Dept. of Earth and Space Science and Engineering, York University, Toronto, Canada

Road surface conditions decline due to heavy traffic volumes, severe weather, and recurring utility works, yet still, many road agencies still rely on manual windshield surveys and semi-automated inspections. Not only are these methods time-consuming, but also difficult to scale and labour-intensive. With the help of recent advances in deep learning and the widespread availability of built-in vehicle dashcams, they offer new opportunities for low-cost, automated pavement assessments. This contribution presents a mobile, dashcam-based framework for detecting road-surface defects using the latest YOLOv11, which is combined with geolocation tagging for spatial visualization.

To test out our YOLOv11 training model, we conducted the initial dataset at the University of the Fraser Valley campus and manually annotated it to identify crack fillings, crosswalk markings, speed bumps, lane markings, and other surface conditions. This was just a prototype, which would later be trained to detect all road conditions, such as gravel, potholes, and uneven roads, as well. To address variations in lighting and motion, augmentation techniques were applied. YOLOv11 acquired a mean average precision above 90% across all tested categories.

This prototype demonstrates a practical, low-cost approach for real-time pavement monitoring. Future work includes expanding data collection, developing an operational dashboard for road authorities, having exact GPS coordinates pinned on maps with damaged road images, and evaluating model performance across different data sources, including models trained through Google Images. By producing actionable geospatial information, this system supports more efficient maintenance workflows and offers a scalable pathway for municipalities seeking to modernize road-condition assessment.

2:00pm - 2:15pm

Towards Global Interpretability: Evaluating XAI Metrics in Building Footprint Extraction

Elif Ozlem Yilmaz, Taskin Kavzoglu

Gebze Technical University, Turkiye

Global population is projected to increase by about 70% by 2050, with a growing proportion of people living in urban areas. This trend highlights the importance of accurately assessing urban expansion. Automatic building detection from remotely sensed imagery using deep learning (DL) has demonstrated considerable potential for applications, including sustainable urban planning and infrastructure monitoring. However, the inherent black-box nature of DL models limits their transparency and reduces trust in model-driven decisions. Although various Explainable Artificial Intelligence (XAI) approaches have been proposed to highlight image regions influencing model predictions, qualitative visual inspection alone is insufficient for reliably evaluating the credibility of these explanations. This study evaluates several XAI techniques for building footprint extraction using a U-Net model trained on a refined Massachusetts Buildings Dataset. The segmentation model achieved precision, recall, F1-score, IoU, and overall accuracy values of 89.68%, 85.69%, 87.53%, 79.03%, and 94.35%, respectively. To investigate the model’s decision-making process, three explanation methods, namely Saliency, GradientSHAP, and GuidedGradCAM, were applied. The quality of the generated explanations was then quantitatively assessed using 16 evaluation metrics. Beyond single-image analysis, a dataset-level evaluation was conducted using 547 image patches containing building coverage greater than 20%. The results indicate that GuidedGradCAM produces more consistent and reliable explanations. Furthermore, dataset-level analysis using dense-building samples provides a statistically more robust representation of overall model behaviour compared to evaluations based on individual images. These findings highlight the importance of quantitative assessment in validating the interpretability of DL models for building footprint extraction.

2:15pm - 2:30pm

MaskRoof: A deep Learning Framework and Benchmark Dataset for fine-grained urban Rooftop Utilization and potential Analysis

Jinfeng Xie¹, Haojie Yang², Lingshuang Dong³, Anthony Yeh¹, Yi Zhang²

¹The University of Hong Kong, Hong Kong S.A.R. (China); ²Institute of Future Human Habitats, Tsinghua Shenzhen International Graduate School; ³Huawei Technologies Co., Ltd., Dongguan, Guangdong Province, China

Urban rooftops represent a critical vertical resource for sustainable development, yet comprehensive assessment of their utilization patterns and available capacity remains constrained by inadequate datasets and limited algorithmic capabilities. This study introduces the Urban Rooftop Utilization Dataset (URUD), the first multi-city, pixel-level semantic segmentation dataset encompassing 1,560 high-resolution satellite images from four Chinese cities. URUD establishes eight semantic categories including a novel "available area" class to address ambiguous regions that existing classification schemes fail to capture. The study further proposes MaskRoof, a transformer-based deep learning framework specifically designed for fine-grained rooftop analysis. The model integrates two task-specific modules, Hierarchical Zoom-in Attention (HZA) and Prior-Guided Cross-Attention (PGCA), to address challenges of small-scale target detection and class imbalance. Experimental results demonstrate that MaskRoof achieves superior performance with 94.46% accuracy and 47.29% mIoU, outperforming existing segmentation architectures. Application to Shanghai's outer ring area reveals that 60.74% of rooftop space remains available for utilization, with significant spatial heterogeneity across building types. Industrial and warehouse structures retain substantially greater unutilized areas compared to office and residential buildings. These findings provide quantitative evidence for differentiated urban planning strategies and demonstrate the framework's capability for large-scale rooftop potential assessment in complex urban environments.

2:30pm - 2:45pm

A comparison of CNN, Transformer, and open-vocabulary architectures for real-time photovoltaic defect detection using UAV thermal imagery.

Aissam Salah¹, Mouad Jabrane², Imane Sebari¹

¹Department of Photogrammetry and Cartography, School of Geomatics and Surveying Engineering, IAV Hassan II, Rabat, Morocco; ²Research Unit of Geospatial Technologies for a Smart Decision, IAV Hassan II, Rabat 10101, Morocco

Real-time defect detection in solar farms is critical for profitability and safety. This paper compares state-of-the-art (SOTA) object detection architectures for deployment on edge computing platforms for the purpose of thermal PV defect detection using UAV imagery. We systematically evaluated Closed-Set (YOLOv10, YOLOv12, RT-DETR, RF-DETR) and Open-Vocabulary (YOLO-World, OWL-ViT) models on a public thermal dataset. Our results highlight two leading candidates. The transformer-based RF-DETR sets a new accuracy record at 82.6% mAP@0.50, driven by its self-supervised backbone, but its inference speed is low (12.6 FPS). Conversely, the CNN-based YOLO-World integrates language semantics to reach a competitive 78.1% mAP@0.50 while operating at a real-time speed of 31.3 FPS. We conclude that both RF-DETR and YOLO-World are promising for embedded thermal fault detection. The final selection will depend on on-platform inference performance.

3:30pm - 5:15pm

ThS18: Advances in Reality Capture, AI, and Digital Twin Technologies for Construction Engineering
Location: 713A

3:30pm - 3:45pm

Image sequence based prediction of the temporal evolution of fresh concrete properties under realistic conditions

Max Meyer¹, Amadeus Langer¹, Max Mehltretter¹, Dries Beyer², Max Coenen³, Bastian Strybny³, Tobias Schack⁴, Michael Haist⁴, Christian Heipke¹

¹Institute of Photogrammetry and GeoInformation, Leibniz University Hannover, Germany; ²Feist Construct GmbH, Bad Pyrmont, Germany; ³Institute of Building Materials Science, Leibniz University Hannover, Germany; ⁴Institute of Construction Materials, University of Stuttgart, Germany

Advancing the level of digitalization and automation in concrete manufacturing can substantially contribute to lowering CO2 emissions associated with the concrete production. This work introduces a new methodology for predicting the time-dependent properties of fresh concrete during mixing. For the prediction, a deep learning network is created which uses stereoscopic image sequences of the flowing material together with tabular data as input. Besides mix design parameters and process state data, like energy consumption, moisture and fresh concrete temperature, temporal information is included in the tabular data. The temporal information represents the time interval between image acquisition and the time for which the properties should be predicted. During training, this interval corresponds to the difference between the image acquisition and the time at which reference measurements are taken, allowing the network to implicitly learn the temporal evolution of the material properties, namely the slump flow diameter, yield stress, and plastic viscosity. Incorporating time-dependent prediction enables the forecasting of property changes throughout the mixing process, offering a valuable tool for real-time process control. This capability allows timely adjustments whenever deviations from the desired material behavior are detected. The experimental investigations presented in this paper demonstrate the feasibility of this method under realistic conditions.

3:45pm - 4:00pm

Single-image to model registration for semantic enrichment of indoor BIM

Dorota Włodarczyk, Małgorzata Jarząbek-Rychard

Institute of Geodesy and Geoinformatics, Wrocław University of Environmental and Life Sciences, Poland

Effective integration of geometric and semantic data within Building Information Models (BIM) is essential for the efficient life cycle management of modern facilities. However, maintaining accurate as-is BIM models for existing buildings remains a significant challenge, as manual updates are labour-intensive and full 3D reconstruction is often impractical for incremental changes. In such cases, image-based approaches offer a fast and flexible alternative, but require reliable alignment of 2D imagery with existing BIM geometry. To address this challenge, this study introduces a streamlined pipeline for semantic enrichment that uses a single-image visual localisation approach to directly align 2D imagery with existing BIM geometry. The proposed method integrates transformer-based panoptic segmentation (Mask2Former) with a closed-form Perspective-n-Line solver to estimate 6-degrees-of-freedom (6-DoF) camera poses. The novelty of the proposed approach lies in the explicit use of semantic information as a geometric constraint to guide the selection of 2D–3D correspondences for pose estimation. Semantic labels are utilised to filter line correspondences, ensuring that only stable architectural boundaries (e.g., walls, floors, and ceilings) are used in the registration process. Such semantic filtering stabilises correspondence selection, effectively mitigating pose ambiguity in repetitive indoor layouts or scenes where structural elements are partially obscured by furniture and clutter. Experimental results confirm high accuracy, achieving a median position error of 9.84 cm and an orientation error of 1.05° in complex indoor environments. This robust registration framework provides a reliable foundation for the downstream semantic enrichment and digital twin updates.

4:00pm - 4:15pm

LSTNet: Local Shape Transformer Network for Road Marking Extraction

Jiafeng Wu^1,2,3, Chaorui Liu^1,2,3, Jiajun Shi^1,2,3, Jonathan Li^1,2,3,4, Lingfei Ma^1,2,3,4

¹Key Laboratory of Geographic Information Science (Ministry of Education), East China Normal University, Shanghai 200241, China; ²Key Laboratory of Spatial-temporal Big Data Analysis and Application of Natural Resources in Megacities, Ministry of Natural Resources, East China Normal University, Shanghai 200241, China; ³School of Geospatial Artificial Intelligence, East China Normal University, Shanghai 200241, China; ⁴Hinton STAI Institute, East China Normal University, Shanghai 200241, China

Road markings are vital for HD maps and autonomous driving, yet LiDAR-based extraction is difficult due to missing RGB information, severe class imbalance, and thin, elongated geometry under sparse and noisy returns (Ma et al., 2020). We propose LSTNet, which performs local-shape tokenization by grouping points on tangent planes and encoding tokens from relative coordinates, normals, curvature, and intensity contrast. A geometry-aware transformer aggregates these tokens across multiple scales with attention biased by relative position and normal similarity, capturing long and thin structures while preserving edges. Our contributions can be summarized as follows: (1) We present LSTNet, which directly segments road marking from 3D point clouds, avoiding image conversion and preserving geometric fidelity. (2) We introduce a dedicated point-cloud dataset for road marking extraction to enable training and fair evaluation. (3) We design a task-specific and boundary-aware training objective that improves thin road marking recall and robustness under class imbalance.

4:15pm - 4:30pm

Automatic 3D Building Model Generation for Energy Digital Twins

Oscar Roman^1,2, Giorgio Agugiaro³, Ken Arroyo Ohori³, Maarten Bassier⁴, Elisa Mariarosaria Farella¹, Fabio Remondino¹

¹3D Optical Metrology, Bruno Kessler Foundation, via Sommarive 18, Trento, Italy; ²University of Trento, EICS and DII Department, Trento, Italy; ³3D Geoinformation group, Department of Urbanism, Faculty of Architecture and Built Environment, Delft University of Technology, Delft, The Netherlands; ⁴Department of Civil Engineering, TC Construction - Geomatics, KU Leuven - Faculty of Engineering Technology, Ghent, Belgium

The concept of Digital Twins (DTs) in Architecture, Engineering and Construction (AEC) domain encompasses a wide range of applications and scales, from single buildings to entire cities, spanning monitoring, simulation, energy management and operational control. Regardless of the specific application, a valid Digital Twin (DT) is a dynamic, data-driven model that stays continuously synchronized with its physical counterpart in both time and state via sensors and the Internet of Things (IoT). It must receive real-world input and provide feedback for analysis or control, ultimately progressing toward a self-operational DT. In the energy domain, an Energy Digital Twin (EDT) must be designed to (i) include sufficient geometric information (ii) support continuous monitoring, (iii) assist scenario-based simulation and (iv) enable operational maintenance and decision support. To achieve these objectives, the EDT’s geometry should be managed through two complementary representations: (i) a watertight solid volumetric model for physics-based simulation and (ii) a boundary representation (B-Rep) model for precise topology, semantics and data exchange. A mapping layer keeps the two representations consistent, preserving identity and topology across states and linking to the graph. Consequently, the EDT should adopt a multi-level architecture defining both geometric and data structures. This work introduces a robust Scan-to-Energy Digital Twins (Scan-to-EDTs) framework that generates multi-level building EDTs by integrating geometric, semantic and simulation layers to enable interoperable energy analyses.

4:30pm - 4:45pm

From propagation to prediction: point-level uncertainty evaluation of MLS point clouds under limited ground truth

Ziyang Xu¹, Olaf Wysocki³, Christoph Holst^1,2

¹Chair of Engineering Geodesy, TUM School of Engineering and Design, Technical University of Munich; ²TUM Leonhard Obermeyer Center, Technical University of Munich; ³CV4DT, University of Cambridge

Evaluating uncertainty is critical for reliable use of Mobile Laser Scanning (MLS) point clouds in many high-precision applications such as Scan-to-BIM, deformation analysis, and 3D modeling. However, obtaining the ground truth (GT) for evaluation is often costly and infeasible in many real-world applications. To reduce this long-standing reliance on GT in uncertainty evaluation research, this study presents a learning-based framework for MLS point clouds that integrates optimal neighborhood estimation with geometric feature extraction. Experiments on a real-world dataset show that the proposed framework is feasible and the XGBoost model delivers fully comparable accuracy to Random Forest while achieving substantially higher efficiency (about 3 times faster), providing initial evidence that geometric features can be used to predict point-level uncertainty quantified by the C2C distance. In summary, this study shows that MLS point clouds' uncertainty is learnable, offering a novel learning-based viewpoint towards uncertainty evaluation research.

4:45pm - 5:00pm

Automatic Scan-to-BIM: The Impact of Semantic Segmentation Accuracy on Opening Detection

Jidnyasa Patil, Arcot Sowmya, Mohsen Kalantari

University of New South Wales, Sydney, Australia

The automation of Scan-to-BIM remains a major challenge within the Architecture, Engineering, and Construction industry, particularly in the detection and geometric characterisation of architectural openings such as doors and windows. Although recent advances in 3D semantic segmentation have improved the classification of architectural elements, the effect of segmentation accuracy on downstream geometric detection and reconstruction is still under study. This work compares five state-of-the-art deep learning models, PointNeXt, PointMetaBase, Point Transformer V1, Point Transformer V3, and Swin3D, on opening detection in Scan-to-BIM. A unified evaluation framework integrating DBSCAN clustering with axis-aligned bounding box fitting is introduced to generate per-instance geometric representations. The models are assessed using semantic metrics and geometric reliability indicators, including centroid error, dimensional deviation and 3D IoU. Experiments on the S3DIS Area 5 dataset, reveal notable performance differences across models. Swin3D achieved the highest door detection rate of 96.9%, followed by PointMetaBase at 92.9%, PointNeXt at 87.4%, PTV3 at 85.0%, and PTV1 at 81.9%. Window detection proved more challenging, with Swin3D and PTV3 both achieving 75.0%, PTV1 at 71.2%, and PointNeXt and PointMetaBase at 67.3%. Notably, PointMetaBase produced strong geometric accuracy for doors despite lower semantic scores. These results suggest that high segmentation accuracy does not always lead to precise geometric reconstruction. To assess generalisation, the trained models were applied to 11 Matterport3D rooms, confirming that the observed patterns extend across different scanning environments. This study concludes that in Scan-to-BIM workflows, greater emphasis should be placed on geometric reconstruction algorithms than segmentation performance alone.

5:00pm - 5:15pm

Fast and accurate point surveying using the PIX4Dcatch mobile app

Giulia Rovelli¹, Marta Coelho Lopes², Gaia Amaranta Taberna¹, Adrian Fernandez¹, Paloma Pomares¹, Jean-Baptiste Magnin¹, Andrei Mitache¹, Davide Antonio Cucci¹, Christoph Strecha¹, Pierangelo Rothenbühler¹

¹PIX4D SA, Switzerland; ²École Polytechnique Fédérale de Lausanne (EPFL), Switzerland

The digitalization of the architecture, construction and subsurface utility engineering sectors demands efficient, accurate and flexible 3D point surveying methods. Established ones based on Global Navigation Satellite System (GNSS) rovers or total stations suffer from significant limitations, such as requiring open-sky visibility, high costs and complex setups. This paper introduces a novel method for georeferencing 3D points using the PIX4Dcatch mobile application coupled with an external Real-Time Kinematic (RTK) GNSS receiver. The method enables to survey a point of interest by just aiming the smartphone and tapping on the screen during a capture. A lightweight, modified Bundle Adjustment algorithm runs on the device, delivering accurate 3D coordinates in seconds without any post-processing. We evaluated the method by surveying several known cadaster points for hundreds of times across diverse field conditions, achieving a mean planimetry error norm of approximately 3 cm and 97% of errors below 10 cm. Similar statistics are achieved with single-point measurements using an RTK rover. Although not intended to replace millimeter-precision instruments, the accuracy profile of our method is perfectly suited for many applications, such as subsurface utility mapping, which often have decimeter-level regulatory requirements. Given its high efficiency, low cost and ease of use, we believe that our method has the potential to transform as-built documentation workflows in diverse engineering sectors.