JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at isprs2026@icsevents.com.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Agenda Overview

Session

P2: Poster Session 2

Time:

Tuesday, 07-July-2026:

3:30pm - 5:30pm

Location: Exhibition Hall "E"

Presentations

Refractive Effects of Planar Protective Layers in Stereo Photogrammetry and Their Correction

Zhaoquan Liu¹, Binbin Xu², Wenxing Xu³, Shigang Liu⁴, Yongfeng Ma⁵, Guanqing Li⁶

¹CCCC First Harbor Engineering Company Ltd., 300461 Tianjin, China – liuzhaoquan@ccccltd.cn; ²No.3 Engineering Company Ltd. of CCCC First Harbor Engineering Company, 116011 Dalian, China; CCCC First Harbor Engineering Company Ltd., 300461 Tianjin, China; Key Laboratory of Geotechnical Engineering, CCCC, 300461 Tianjin, China; Key Laboratory of Port Geotechnical Engineering, Ministry of Transport, PRC, 300461 Tianjin, China; Key Laboratory of Port Geotechnical Engineering of Tianjin, Tianjin 300461, China – 2016046927@ccccltd.cn; ³No.3 Engineering Company Ltd. of CCCC First Harbor Engineering Company, 116011 Dalian, China – xuwenxing1@ccccltd.cn; ⁴No.3 Engineering Company Ltd. of CCCC First Harbor Engineering Company, 116011 Dalian, China – liushigang1@ccccltd.cn; ⁵No.3 Engineering Company Ltd. of CCCC First Harbor Engineering Company, 116011 Dalian, China – mayongfeng1@ccccltd.cn; ⁶School of Environment and Spatial Informatics, China University of Mining and Technology, 221116 Xuzhou, China – guanqing.li@cumt.edu.cn

This study addresses the impact of planar protective layers on stereo photogrammetry and introduces a rigorous refractive correction model based on multi-interface ray tracing. Conventional stereo reconstruction assumes a single viewpoint, but planar layers introduce refraction at two interfaces, causing systematic depth-dominated errors. Through simulations and field experiments using an Intel RealSense D455, the study evaluates the influence of target distance, layer thickness, orientation, and layer-to-camera spacing. Simulations with multiple target planes show that conventional stereo produces significant errors—up to several millimeters in depth—even for thin layers, while the refractive model consistently reconstructs points with sub-millimeter accuracy. Layer distance from the camera has negligible effect on the error magnitude, whereas tilts and thicknesses of the layer strongly influence the bias. Field experiments with a 10-mm acrylic plate confirm these findings: conventional reconstruction exhibits systematic lateral and depth errors, whereas the refractive model eliminates bias, achieving near-zero mean errors. The results highlight that even minimal protective layers induce measurable errors if refraction is ignored, emphasizing the necessity of refractive correction in high-precision applications. The study demonstrates that explicitly modeling refraction in stereo photogrammetry significantly improves reconstruction accuracy and robustness. Overall, this work provides a practical framework for accurate 3D measurement in hazardous environments where imaging through protective layers is unavoidable.

Augmenting City Models with Handheld LiDAR and 3D Gaussian Splatting for Inclusive Pedestrian Infrastructure Assessment

Deni Suwardhi^1,2, Wahyunan Andika², Ratri Widyastuti¹, Widiatmoko Azis Fadilah³, Arnadi Murtiyoso³, Pierre Grusennmeyer³, Fabio Remondino⁴, Farhan Helmy⁵

¹Spatial System and Cadastral Research Group, Institut Teknologi Bandung (ITB), Indonesia; ²PT Inovasi Mandiri Pratama, Spatial Information Company, Indonesia; ³Université de Strasbourg, CNRS, INSA Strasbourg, ICube Laboratory UMR 7357, Photogrammetry and Geomatics Group, 67000, Strasbourg, France; ⁴3D Optical Metrology (3DOM) Unit, Bruno Kessler Foundation (FBK), Trento, Italy; ⁵Advanced System Computing, Design and Innovation (ASCODI) Laboratory, Indonesia

Urban digital twins increasingly require pedestrian-scale three-dimensional (3D) representations to support accessibility and inclusiveness assessment. However, existing approaches typically emphasize either geometric accuracy or visual realism, while lacking an integrated framework for analysing pedestrian-level conditions. This study proposes a hybrid workflow integrating handheld LiDAR and 3D Gaussian Splatting (3DGS) within a CityGML-based semantic framework for accessibility assessment. Handheld LiDAR provides centimetre-level geometric measurements, enabling the extraction of key indicators such as slope, surface roughness, and obstacle presence. In parallel, 3DGS reconstruction from 360° video imagery enhance visual realism and perceptual understanding. Both datasets are co-registered and structured within the CityGML 3.0 Transportation model to represent pedestrian environments in a unified spatial and semantic framework. Accessibility assessment was conducted using three approaches: LiDAR-based analysis, field survey observations, and immersive evaluation in a Virtual Reality (VR) environment. The LiDAR-based results were used as a reference. Comparative analysis shows the field survey assessment achieves an agreement of approximately 85.7%, while VR-based assessment reaches approximately 75.4%. The results indicate that while VR does not replace metric-based analysis, it enables perception-driven and participatory evaluation. In particular, VR-based assessment shows potential to involve users, including people with disabilities, in accessibility evaluation through immersive and remote interaction. The proposed approach contributes to the development of human-scale urban digital twins by integrating metric accuracy, semantic structure, and participatory evaluation for more inclusive accessibility analysis

AI-driven extraction of road geometry and asset inventory from mobile LiDAR point clouds

Divya Priya Balasubramani, Zaffar Sadiq Mohamed-Ghouse, Sanjay Khanna D, Ravichandran N, Muthu Kumara Samy S

Institute of Remote Sensing, Department of Civil Engineering, College of Engineering Guindy, Anna University Chennai, India

Rapid urbanization and rising traffic demand are placing significant pressure on transportation infrastructure, necessitating more efficient and accurate approaches to road design auditing and asset management. Traditional survey methods are labor-intensive, time-consuming, and lack comprehensive three-dimensional context. This study presents an end-to-end framework integrating Mobile Light Detection and Ranging (LiDAR) with Artificial Intelligence (AI) for automated extraction of road geometric parameters and asset inventory. Mobile LiDAR data were collected along an urban corridor in Bengaluru, India, and preprocessed using Trimble Business Center. Preprocessing involved statistical outlier removal and progressive morphological ground segmentation. A deep learning model based on the PointNet++ architecture with hierarchical set abstraction layers was developed to classify point cloud data into five categories: road, pole, vehicle, tree, and building. The dataset comprised approximately 45 million points, with 10% manually annotated for training. The trained model enabled large-scale semantic segmentation, achieving a mean Intersection-over-Union (mIoU) of 0.86 and an overall accuracy of 92.4%. Using the classified outputs, key road design parameters—including lane width (8.099 m), road segment length (44.383 m), zebra crossing width (7.336 m), and pole height (7.890 m)—were accurately derived. The proposed workflow reduced manual processing time by approximately 85% (from 40 hours to 6 hours per km) while enhancing measurement consistency and scalability. The results highlight the effectiveness of integrating mobile LiDAR and AI for high-accuracy, data-driven infrastructure assessment, offering a scalable solution for improved planning and management of urban transportation systems.

Rigorous Projection for Image Stitching: a 3D-Informed Approach for Accurate Panoramic Photogrammetry

Riccardo Roncella¹, Luca Perfetti²

¹University of Parma, Department of Engineering and Architecture, 43124, Parma, Italy; ²University of Brescia, Department of Civil Engineering, Architecture, Territory, Environment and Mathematics, 25123, Brescia, Italy

Panoramic image stitching traditionally relies on the assumption that all input images share a single projection centre, a condition rarely satisfied by modern multi-camera rigs composed of multiple fisheye sensors mounted with non-negligible baselines. In confined or close-range environments, these geometric discrepancies introduce significant parallax, limiting the reliability of both classical and “parallax-tolerant’’ stitching techniques based on local warping. Although such methods are simple and efficient, they cannot account for the true camera geometry and therefore degrade the metric quality of the final panorama. At the same time, recent photogrammetric software has begun to accept panoramic imagery directly, yet literature demonstrates that optimal accuracy is still obtained when processing raw multi-camera.

This work presents a new 3D-informed approach for generating panoramic images that fully respects the underlying geometry of the acquisition system. Assuming the availability of a 3D model, derived either from photogrammetric reconstruction or from an external sensor such as LiDAR, the method reprojects each pixel of the desired panorama onto the original multi-camera frames using collinearity equations, mirroring the workflow of precision orthophoto generation. This allows the production of parallax-free panoramas with consistent geometric fidelity even in challenging scenarios.

The method is evaluated on several case studies using both compact panoramic cameras and multi-camera systems with larger baselines. Results demonstrate improvements in stitching accuracy, SfM orientation quality, and final 3D reconstruction, including robustness to varying scene complexity and supporting 3D-model resolution.

Extrusion Segmentation Strategy to improve CAD Reconstruction from Point Cloud

Said Harb, Mehdi Maboudi, Markus Gerke

Technische Universität Braunschweig; Institute of Geodesy and Photogrammetry, Germany

Recovering editable CAD models from point cloud scans is a key challenge in reverse engineering and quality control, where the ability to reconstruct the original modeling history of a physical object enables precise deviation analysis and systematic process optimization. While deep learning has driven significant progress in this area, existing models struggle to generalize to complex CAD models, which feature multiple extrusions and intricate geometric structures.

This paper presents an end-to-end deep learning pipeline that reconstructs CAD models from point clouds as structured CAD sequences, which are series of sketch-and-extrude operations that encode the full modeling history. The model demonstrates high-fidelity reconstruction for non-complex objects, including primitive shapes such as cubes and cylinders, as well as their assemblies.

To address the performance gap on complex shapes, we introduce an extrusion-based segmentation strategy that decomposes CAD models into their constituent extrusions. These partial shapes are incorporated into the training set, increasing data diversity without requiring new data collection. The resulting primitive models feature partially occluded point clouds, surfaces hidden in the original assembly are absent, which forces the model to infer missing regions and learn richer point cloud representations. This increases the complexity of the reconstruction problem and thereby improves generalization.

The strategy is model-agnostic and can be applied to any deep learning approach that reconstructs CAD sequences, making it a broadly applicable tool for the community.

Controlled Multi-source Mapping of Lunar South Polar Regions via Combined Bundle Adjustment

Qionghua You¹, Zhen Ye^1,2, Yusheng Xu^1,2, Rong Huang^1,2, Huan Xie^1,2, Xiaohua Tong^1,2

¹College of Surveying and Geoinformatics, Tongji University, Shanghai, China; ²The Shanghai Key Laboratory of Space Mapping and Remote Sensing for Planetary Exploration, Shanghai, China

Integration of LROC NAC and ShadowCam imagery is essential for meter-scale controlled mapping of the entire lunar south pole including Permanently Shadowed Regions (PSRs), but remains challenging due to extreme radiometric differences, sparse overlap across illumination boundaries, and ill-conditioned bundle adjustment networks. This paper proposes a LOLA DEM-mediated multi-source bundle adjustment framework for controlled lunar polar mapping. A hierarchical cross-modality matching strategy is developed using first- and second-order Gaussian steerable gradient features with multi-scale fusion and phase-correlation-based subpixel refinement. Sensor-specific geometric models are established using second-order polynomial transformations for NAC orthoimages and rational polynomial models for ShadowCam map-projected images. Five types of geometric constraints are formulated to integrate intra-sensor, limited cross-sensor, and image-to-DEM observations, with the LOLA DEM acting as a common geometric mediator. To stabilize the heterogeneous network, a hybrid L1-L2 regularization model with adaptive two-stage weighting is optimized using ADMM algorithm. Experiments in the lunar south polar region demonstrate substantial improvements on intra-sensor, cross-sensor, and image-to-reference positioning accuracy. The final seamless 1 m/pixel orthorectified mosaics achieve approximately 5 m absolute accuracy, validating the proposed framework for geometrically unifying illuminated and permanently shadowed terrain in lunar polar controlled mapping.

Automated and Comprehensive Quality Assessment of Nationwide Aerial LiDAR Data: Insights from the LiDAR-ITA Project

Vittorio Casella, Marica Franzini, Davide Lodigiani

University of Pavia, Italy

National LiDAR programs are increasingly adopted worldwide to support land management, infrastructure planning, and environmental monitoring. Following the examples of large-scale initiatives in the United States and Europe, Italy launched its first nationwide LiDAR survey in July 2025 within the Integrated Monitoring System (SIM) project funded by the National Recovery and Resilience Plan (PNRR). This effort represents the most extensive airborne LiDAR campaign ever conducted in the country, covering over 302,000 km², including coastal zones and major islands. The acquisition plan is designed to ensure a minimum point density of 10 points/m² and produce high-resolution DTMs and DSMs at a 0.25 m grid spacing.

Given the unprecedented spatial and data volume, a robust, standardised, and fully automated quality assurance framework is essential. This paper presents the methodology used to evaluate geometric consistency and spatial accuracy across the national dataset. Congruence between overlapping flight strips is assessed by automatically extracting 100 × 100 m patches at regular intervals and computing point-to-point distances and cross-section profiles to detect horizontal and vertical discrepancies. Plano-altimetric accuracy is further evaluated through comparisons with terrestrial laser scanning (TLS) data collected in dedicated control areas, where robust plane fitting enables rigorous three-dimensional error estimation.

Results from two control areas acquired with different sensors demonstrate the effectiveness, scalability, and reproducibility of the proposed automated workflows. The presented approach provides a reliable foundation for delivering high-precision national LiDAR products and offers a framework applicable to future large-scale geospatial acquisition programs.

Synergy of photogrammetric and ULS data for forestry application through the fusion of bundle adjustment and ICP algorithms

Łukasz Wilk¹, Magdalena Pilarska-Mazurek¹, Wojciech Ostrowski^1,2

¹Warsaw University of Technology, Faculty of Geodesy and Cartography, Department of Photogrammetry, Remote Sensing and Spatial Information Systems, Warsaw, Poland; ²Jagiellonian University, Institute of Archaeology, Krakow, Poland

The study explores a workflow for integrating photogrammetric image blocks with LiDAR point clouds acquired via Unmanned Laser Scanning (ULS) in forestry applications. Hybrid datasets combining UAV imagery and LiDAR data are increasingly used for 3D mapping, yet discrepancies often arise due to independent orientation processes and systematic errors. Traditional solutions rely on numerous ground control points (GCPs), which can be impractical in dense forest environments. To address this, the proposed method fuses Bundle Adjustment and Iterative Closest Point (ICP) algorithms in a joint optimization process, aligning multispectral images with ULS point clouds without additional observations or GCPs. The workflow includes a GPU-accelerated filtering step to extract representative canopy points, reducing computational load and improving correspondence selection. Implemented using Python and C++ extensions, the system leverages the Ceres Solver for non-linear optimization, minimizing reprojection, GNSS, IMU, and point-to-cloud errors iteratively. Tests conducted in Żednia Forest District, Poland, during leaf-on and leaf-off seasons demonstrated significant improvements in alignment accuracy: average horizontal errors decreased by over 50%, and maximum offsets were reduced by more than 1 meter. These results confirm that the proposed hybrid adjustment substantially enhances geometric consistency between photogrammetric and LiDAR datasets, offering a cost-effective solution for forestry mapping and monitoring.

Integrating High‑Fidelity 3D Documentation into Immersive Learning: A VR Serious Game for the Holy Aedicule

Margarita Skamantzari¹, Ioannis Georgoulas¹, Ioannis Rallis¹, Antonia Moropoulou², Anastasios Doulamis¹, Andreas Georgopoulos¹

¹Lab of Photogrammetry, School of Rural, Surveying & Geoinformatics Engineering, National Technical University of Athens– Athens, Greece; ²School of Chemical Engineering, National Technical University of Athens– Athens, Greece

This paper introduces an innovative Virtual Reality (VR) serious game designed to enhance immersive learning in cultural heritage education. The game offers an interactive exploration of the Holy Aedicule in Jerusalem, one of the most sacred monuments of Christianity, based on high-resolution 3D documentation captured before, during, and after its rehabilitation.

By integrating photogrammetric data, textured 3D models, and historical research, the application allows users to navigate the monument virtually, engage with embedded educational content, and participate in interactive learning scenarios. Structured as a multi-phase experience, including virtual tours, a digital classroom, and a quiz mode, the serious game aims to promote transdisciplinary knowledge transfer in a user-friendly, entertaining format.

This contribution outlines the game’s methodological framework, educational objectives, development pipeline, and user evaluation results, highlighting its role in redefining how cultural heritage can be communicated through immersive digital tools. Additionally, it addresses the broader challenge of translating complex heritage documentation into accessible and meaningful experiences for learners, researchers, and the wider audience.

GNSS–Camera Systems for Heritage Documentation. Accuracy assessment of measurements of inaccessible points and preliminary tests in photogrammetric applications.

Lorenzo Teppati Losè, Filiberto Chiabrando, Fabio Giulio Tonolo

LabG4CH, Department of Architecture and Design (DAD) - Politecnico di Torino, Viale Mattioli 39, 10125 Torino (Italy)

The contribution investigates the possibility of using a GNSS receiver equipped with a camera for documenting built heritage. In particular, the possibility of measuring GCPs on vertical surfaces thanks to the combination of satellite observations and digital photogrammetric algorithms will be analysed and metrically validated. Moreover, the use of the acquired images in SfM approaches will be tested and discussed.

Generating Synthetic Image Data with Blender to Address Data Scarcity in Military Applications: Leveraging the RF-DETR Model

Julian Cornel Berndt, Tobias Frisenborg Christensen, Lars Würtz Jochumsen

Systematic A/S, Denmark

Military vehicle recognition faces critical data scarcity due to operational security constraints and prohibitive collection costs.

Classification of vehicles demands extensive training data rarely available in defence contexts. We propose a hybrid approach

combining limited real-world data with scalable synthetic generation. Our methodology comprises: (1) a Blender-based pipeline

generating high-resolution synthetic images with domain randomization across 3D models, lighting, and camera angles; (2) training

transformer-based RF-DETR detectors on real-world and synthetic data, respectively; (3) an in-depth evaluation of the trained

networks to determine the effect of synthetic data. Our approach utilizes a baseline RF-DETR detector trained on real-world

imagery to compare against. Then we utilize the custom-made synthetic data generation pipeline to create an equally large synthetic

dataset. This generated data is added to real data subsets, thus creating a mixed datasets containing varying percentages of real data.

We created five datasets containing 5%, 10%, 25%, 50%, and 100%, respectively. With these new mixed datasets we train another

set of RF-DETR detectors. Afterwards we evaluate the influence of the synthetic data by comparing the detectors across computer

vision metrics.

GDC: Geometric diffusion consistency for weather-robust 3D point cloud segmentation

Jing Du¹, John Zelek¹, Michael A. Chapman², Jonathan Li³

¹Department of Systems Design Engineering, University of Waterloo,; ²Department of Civil Engineering, Toronto Metropolitan University; ³Department of Geography and Environmental Management, University of Waterloo

Semantic segmentation of outdoor 3D point clouds degrades significantly under adverse weather, as rain, fog, and snow corrupt the geometric structure of LiDAR returns through backscatter insertion, range-dependent attenuation, and volumetric scattering. Existing domain generalization methods constrain feature values directly, which becomes less effective when weather-induced perturbations alter the local neighborhood topology that underlies feature aggregation. This work proposes Geometric Diffusion Consistency (GDC), a training-time regularizer that enforces consistent feature propagation behavior across geometrically divergent views of the same point cloud. A dual-view augmentation pipeline generates training pairs through weak and strong perturbations, where the strong branch incorporates dual-mode atmospheric extinction modeling, semantic-aware geometric corruption, and weather-coordinated structural perturbation. A lightweight learnable diffusion operator, implemented via sparse convolutions with a gated residual connection, propagates encoder bottleneck features through local voxel neighborhoods. The consistency loss aligns diffused representations at corresponding points across views, preserving topological relationships essential for dense prediction while allowing feature values to adapt to altered geometry. On the SemanticKITTI to SemanticSTF domain generalization benchmark, GDC achieves 38.6% mIoU, exceeding the previous best method by 3.8%, with consistent improvements across dense fog, light fog, rain, and snow conditions.

Integrated workflow for 3D documentation and spatial analysis of Jewish sepulchral heritage – Project "Stone Witnesses Digital: Space, Form, Inscription".

Lea Puglisi, Michael Groh, Patrizia Hanika, Mona Hess

Digital Technologies in Heritage Conservation, Institute of Archaeology, Heritage Conservation Studies and Art History/ Centre for Heritage Conservation Studies and Technologies (KDWT), University of Bamberg

The project 'Stone Witnesses Digital' ensures the exemplary documentation of a selected number of German Jewish graveyards. This paper presents the results from the first years of the project’s geomatics work, including the development of an integrated multi-sensor workflow for 3D imaging—ranging from geographic-scale documentation of entire graveyards (1:200 scale) to detailed feature imaging of individual gravestones (1:20 scale). The workflow supports the long-term research project on Jewish sepulchral culture "Stone Witnesses Digital".The project brings together expertise from Jewish Studies, Digital Technologies in Heritage Conservation, and Historic Building Research.

The overarching scope is to document the location and context of gravestones, their materiality, decorative elements, inscriptions, and the meanings embedded within them—summarized under the guiding concept 'Space, Form, Inscription.' The aim of the project is to create a comprehensive digital dataset that documents inscriptions as well as the spatial and structural characteristics of gravestones, thereby ensuring their long-term preservation and making them accessible for further academic research.

To achieve this, the work-flow must integrate various sensing and 3D imaging techniques, ensure reliable and sustainable data storage, and support reproducible dataset creation for spatio-temporal analyses and long-term monitoring of grave-yards throughout the 24-year project period. It also enables the combination of advanced sensing technologies with semantic web standards and facilitates the creation of informative Open Access outputs compliant with FAIR data principles.

3d Reconstruction of reindeer antlers using a low-cost optical camera system and gaussian splatting

Julian Robert Stevenson Cramb¹, Derek Lichti¹, John Matyas¹, Shabnam Jabari²

¹University of Calgary, Canada; ²University of New Brunswick, Canada

The research presented in this abstract is a novel, low-cost pipeline for the semi-automated 3D reconstruction of reindeer antlers using an optical camera array and Gaussian Splatting (GS). Traditional antler measurement methods are manual, invasive and prone to errors, while existing 3D scanning techniques struggle with subject motion. Photogrammetric bundle adjustment derived point clouds require well defined points which are generally lacking on antlers. To overcome this a system of 16 synchronized Raspberry Pi cameras was used to capture instantaneous imagery within an animal enclosure. A sparse point cloud along with the oriented network of imagery from a bundle adjustment is fed into a GS algorithm, producing an optimized reconstruction of the scene.

The system was initially validated in a controlled lab environment against a terrestrial laser scanner ground truth point cloud. A sub-centimeter accuracy with mean cloud-to-cloud distance of 4.0mm was achieved. Preliminary live-animal testing demonstrates the systems ability to produce a qualitatively accurate reconstruction under various lighting conditions. This method establishes a non-invasive method for high quality 3D reconstructions of complex reindeer antlers, which has applications in wildlife biology, environmental monitoring and biomechanics. Further work will involve rigorous network and camera calibration along with a comprehensive analysis of live-animal data.

A semi-automated pipeline for extracting architectural plans from 3D LiDAR data of ancient heritage sites

Marianna Bartrick-Krana, Roberto de Lima, Aziliz Vandesande, Maarten Bassier

KU Leuven, Belgium

Automatically generating architectural plans from archaeological sites poses a persistent challenge, particularly when dealing with ancient structures that have experienced severe deterioration. Many heritage contexts—especially those involving rock-cut monuments—present highly irregular geometries, collapsed features, eroded walls, and surfaces obscured by sediment or plaster detachment. These conditions make the extraction of reliable 2D plans or cross-sections from 3D data exceptionally difficult using conventional modeling tools.

In this study, we propose a semi-automated processing workflow tailored to the architectural characteristics of the Sheikh Said tombs. The pipeline converts 3D LiDAR datasets into structured 2D plans and vertical cross-sections, with particular emphasis on documenting deep, narrow shafts and multi-chambered tomb layouts.

Spherical Vision meets 3D Semantics: towards efficient LOD3 Model Generation for Smart Cities

Mohammad Saadatseresht^1,2, Hossein Arefi², Qazaleh Askari¹

¹School of Surveying and Geospatial Engineering, University of Tehran, Tehran, Iran; ²i3mainz - Institute for Spatial Information and Surveying Technology, Mainz University of Applied Sciences, Mainz, Germany

The generation of Level of Detail 3 (LoD3) building models is essential for applications such as urban digital twins, energy analysis, and smart city planning. However, conventional approaches based on terrestrial LiDAR or UAV photogrammetry remain costly, labor-intensive, and difficult to scale. This paper presents a scalable framework for transforming LoD1 building models into LoD3 façade representations using openly available urban data, including OpenStreetMap footprints, street-level spherical imagery, and weak point-cloud priors. The proposed method formulates the reconstruction problem as a facet-based modeling task, where each façade is processed independently in a local coordinate system derived from LoD1 geometry. A rectification strategy is introduced to generate fronto-parallel façade images directly from spherical panoramas, avoiding perspective distortions and facilitating image analysis. To address the challenges of unstructured data acquisition, a visibility-driven view selection scheme and a multi-view fusion framework are developed to construct robust façade evidence maps. The 3D geometry is estimated as a depth field through a multi-resolution optimization framework integrating ray consistency, appearance cues, point-cloud support, and structural regularization. Planar segmentation, polygonization, and geometric regularization are subsequently applied to derive structured façade elements. Openings such as windows and doors are detected using combined geometric and image-based evidence and further refined through architectural constraints. Experimental results demonstrate that the proposed framework enables reliable reconstruction of façade geometry and structural details using only open and low-cost data sources, providing a practical pathway for large-scale LoD3 generation in real urban environments.

LiDAR Point Cloud Oversegmentation via SAM-based Knowledge Distillation

Dening Lu¹, Michael Chapman², Jonathan Li^1,3

¹Department of Systems Design Engineering, University of Waterloo; ²Department of Civil Engineering, Toronto Metropolitan University; ³Department of Geography and Environmental Management, University of Waterloo

Large-scale LiDAR point clouds provide rich geometric information, yet learning effective structural representations remains challenging due to the misalignment between semantic categories and geometric structures. To address this issue, we propose a SAM-guided framework for point cloud oversegmentation. We transfer grouping knowledge from 2D vision by constructing a large-scale oversegmentation dataset using the Segment Anything Model (SAM) on bird’s-eye-view projections.

Based on these grouping priors, a structure-aware point cloud encoder is learned via a distillation objective that enforces intra-region compactness and inter-region separation in the embedding space. The proposed approach does not rely on semantic supervision and directly learns generalizable structural representations.

Experiments on various benchmark datasets (STPLS3D, Toronto-3D, DALES, and S3DIS) demonstrate that the proposed method achieves competitive performance.

In particular, it significantly improves boundary recall (e.g., 92.21% on STPLS3D and 93.47% on Toronto-3D) while maintaining high oracle accuracy (up to 97.62%).

Moreover, the model generalizes well to unseen datasets without retraining, showing strong cross-dataset inference capability.

Shape Representation using Gaussian Process mixture models

Panagiotis Sapoutzoglou, George Terzakis, Georgios Floros, Maria Pateraki

National Technical University of Athens, Greece

In this work we propose an object-specific implicit representation: Functional modeling of surface geometry using Gaussian Processes (GPs). n contrast to neural models, our method leverages the ability of GPs to model continuous functions from irregularly sparse sampled data and apply this concept in the context of a probabilistic model that learns the shape of an object as the mixture of multiple directional distance fields anchored at reference points specially placed in the object’s skeletal outline. The resulting mixture model provides continuity, sparsity, and finer shape detail while avoiding the heavy training burden associated with deep implicit methods

A Deep Learning Model for Tree Species Classification Using Ground-Level RGB Imagery and Automated Annotations

Hristina Hristova, Clemens Blattert, Sunni K.P. Kushwaha, Janine Schweier

Swiss Federal Research Institute for Forest, Snow and Landscape Research WSL, Switzerland

Accurate tree species identification is essential for effective forest management, biodiversity monitoring, and resource estimation. While automated methods relying on aerial and canopy-level remote sensing have become prevalent, they often struggle in dense, multi-layered forest stands, where critical lower-stem and bark features are obscured. To address this limitation, we present a Deep Learning (DL) framework for tree species classification utilizing ground-level RGB imagery. Because manual annotation of terrestrial images in forest environments is labor-intensive and complicated by occlusions, we introduce a new "in-situ" forest image dataset alongside an automated labeling pipeline. This pipeline generates training annotations by projecting tree-species data derived from Mobile Laser Scanning (MLS) onto 2D images based on photogrammetric reconstruction. The proposed DL model leverages these automatically labeled images to effectively recognize tree species based on structural and bark characteristics. The model achieves overall F1-scores of 0.78 and 0.75 for object detection and instance segmentation, respectively. Ultimately, our approach complements existing methods for detecting tree positions and diameters, facilitating the creation of a holistic, cost-effective, and scalable forest inventory dataset.

Pattern recognition approaches for the detection of alteration and degradation phenomena in hyperspectral and UAV multispectral imagery: the case study of a historical masonry water bridge

Alessandra Spadaro, Francesca Matrone, Andrea Maria Lingua, Ramin Rashidi Alavijeh

Geomatics Lab, Department of Environment, Land and Infrastructure Engineering, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129 Turin, Italy

Historical masonry hydraulic infrastructures are affected by complex degradation processes, including vegetation growth, moisturerelated anomalies, and salt efflorescence, whose detection requires non-invasive, repeatable, and scalable diagnostic approaches. This study proposes a multi-scale workflow for the detection and classification of degradation phenomena affecting the Cavour Canal water bridge, a nineteenth-century masonry structure in northern Italy. The methodology integrates UAV-based multispectral orthophotos and close-range hyperspectral imagery within a common Object-Based Image Analysis (OBIA) framework. The multispectral workflow was designed for façade-scale screening, whereas the hyperspectral workflow was used to refine the interpretation of selected sectors through detailed spectral characterisation. Multiple supervised classifiers, including Support Vector Machine (SVM), k-Nearest Neighbours (kNN), Decision Tree (DT), Random Trees (RT), and Naïve Bayes (NB), were tested on both datasets. The results show that the multispectral workflow is effective for the identification of vegetation and broad water-related anomalies, with kNN providing the best overall performance, while the hyperspectral workflow improves the discrimination of subtle surface alterations, particularly efflorescence, with SVM yielding the most stable results across the tested configurations. Overall, the proposed methodology demonstrates the value of integrating multispectral and hyperspectral data within a hierarchical workflow for non-invasive degradation mapping of historical masonry infrastructures.

A Framework for Individual Tree Segmentation from Multi-Resolution LiDAR Data in Complex Tropical Forests

Hazem Hanafy¹, Sangyoon Park¹, Songlin Fei², Ayman Habib¹

¹Lyles School of Civil and Construction Engineering, Purdue University, West Lafayette, USA; ²Department of Forestry and Natural Resources, Purdue University, West Lafayette, USA

The increasing demand for accurate forest inventory in tropical ecosystems requires robust, scalable methods for individual tree segmentation. Tropical forests pose particular challenges due to dense understory, high species diversity, and complex multi-layered canopies, which often lead to tree under- and over-segmentation in LiDAR-based workflows. This study presents a general framework for individual tree segmentation from dense, multi-resolution LiDAR point clouds acquired by a Backpack LiDAR system over a 15-year-old palm stand in Belém, Brazil. After trajectory enhancement and mapping, an adaptive cloth simulation filter is used to derive a Digital Terrain Model and height-normalized points. Woody components are then isolated using Otsu-based intensity thresholding, eigenvalue-derived linearity, and statistical outlier removal. Trunk detection combines DBSCAN clustering on lower-stem points with a dual tree-localization strategy based on sum-of-elevation heat maps and RANSAC circle fitting. A segmentation quality-control module addresses over- and under-segmentation before reattaching canopy and foliage via voxel-based KD-tree retrieval to generate final per-tree segments. Compared with 3DFIN and TreeLearn using point cloud–derived reference tree locations, the proposed framework achieves a precision of 92.85%, recall of 95.97%, and F1-score of 94.38%, substantially outperforming 3DFIN (75.97%) and TreeLearn (15.14%). These results demonstrate the potential of the proposed framework to deliver reliable tree-level inventories in complex tropical forests.

Digital Preservation and Augmented Reality for Historical Surveying Instruments: A Photogrammetric Approach to Cultural Heritage Documentation

Clóvis Andrade, Juyara Bezerra, Simone Sato, Karoline Jamur

Universidade Federal de Pernambuco, Brazil

Historical surveying instruments embody centuries of innovation in cartography and engineering, serving as crucial scientific and pedagogical artifacts. Their fragility, risk of damage, and limited exhibition space restrict access and highlight the need for effective preservation strategies (Duester, 2023). Traditional conservation methods protect material integrity but do not address broader challenges related to accessibility and engagement. Digital technologies now offer transformative alternatives capable of creating accurate and interactive representations of these instruments (Farella et al., 2022).

This study proposes a low-cost, replicable digital preservation pipeline integrating close-range photogrammetry and augmented reality (AR). Photogrammetry provides a non-contact method for generating detailed 3D models using consumer-grade smartphones, democratizing access to advanced documentation techniques (Icardi et al., 2018; Förstner & Wrobel, 2016). AR enables users to interact with these digital surrogates in real environments, fostering deeper engagement and overcoming limitations imposed by fragile originals (Spallone, 2022; Gong et al., 2022).

Image acquisition was conducted with a Xiaomi Poco F5 Pro under controlled lighting, maintaining 30–60% overlap. Processing in Agisoft Metashape included alignment, dense cloud generation, mesh reconstruction, and texturing. Post-processing in Blender optimized the models for real-time visualization. Integration into AR was achieved using Unity and the Vuforia Engine SDK.

Results demonstrate high-fidelity 3D models that preserve fine details and offer immersive AR interaction. This pipeline provides durable digital records, enhances educational experiences, and expands public access. The approach aligns with ISPRS Working Group II/6 objectives and offers a scalable model for cultural heritage institutions seeking accessible and effective preservation strategies.

Synthetic Dataset Generation for Partially Observed Indoor Objects

Jelle Vermandere, Maarten Bassier, Maarten Vergauwen

KU Leuven, Belgium

Learning-based methods for 3D scene reconstruction and object completion require large datasets containing partial scans paired with complete ground-truth geometry. However, acquiring such datasets using real-world scanning systems is costly and time-consuming, particularly when accurate ground truth for occluded regions is required.

In this work, we present a virtual scanning framework implemented in Unity for generating realistic synthetic 3D scan datasets. The proposed system simulates the behaviour of real-world scanners using configurable parameters such as scan resolution, measurement range, and distance-dependent noise. Instead of directly sampling mesh surfaces, the framework performs ray-based scanning from virtual viewpoints, enabling realistic modelling of sensor visibility and occlusion effects. In addition, panoramic images captured at the scanner location are used to assign colours to the resulting point clouds.

To support scalable dataset creation, the scanner is integrated with a procedural indoor scene generation pipeline that automatically produces diverse room layouts and furniture arrangements. Using this system, we introduce the V-Scan dataset, which contains synthetic indoor scans together with object-level partial point clouds, voxel-based occlusion grids, and complete ground-truth geometry. The resulting dataset provides valuable supervision for training and evaluating learning-based methods for scene reconstruction and object completion.

Automatic Segmentation of 3D Gaussian Splatting for Urban Cultural Heritage Sites

Widiatmoko Azis Fadilah, Virgile Gauthier, Arnadi Murtiyoso, Tania Landes, Pierre Grussenmeyer

Université de Strasbourg, CNRS, INSA Strasbourg, ICube Laboratory UMR 7357, Photogrammetry and Geomatics Group, 67000, Strasbourg, France

3D Gaussian Splatting (3DGS) has emerged as a promising method for photorealistic scene reconstructions, yet its application to semantic segmentation in real-world heritage documentation remains underexplored. This study proposes and evaluates an automated semantic 3DGS segmentation pipeline integrating the Segment Anything Model 3 (SAM 3) with per-class prompting for Gaussian reconstruction, applied to a nadiral UAV dataset of the Siti Inggil heritage complex in Cirebon, Indonesia. Segmentation performance of four semantic classes (ground, roofs, vegetations, and water bodies) were assessed against manually segmented 2D and 3D reference data, supplemented by geometric accuracy assessment via the M3C2 analysis. Results reveal both the promise and the inherent challenges of applying 3DGS segmentation to complex real-world heritage scenes, where acquisition geometry, surface characteristics, and foundational model limitations can be observed.

Collaborative Multimodal Drone-Based Remote Sensing for Levee Piping Detection

Tu Hu, Tongqi Wang, Shan Su, Changjun Chen, Haoxiang Liu

Wuhan University, China, People's Republic of

This paper addresses the critical challenge of early and accurate detection of piping, a major failure mode in levee systems. Traditional methods are limited, and even advanced techniques such as infrared thermography struggle to capture weak thermal anomaly signals under complex environmental interference. To overcome these limitations, we propose an innovative intelligent algorithm that achieves breakthroughs by synergistically integrating drone-based infrared imagery and point cloud data.

The methodology follows a rigorous two-stage pipeline. First, potential piping zones are coarsely extracted from thermal infrared images using an enhanced saliency detection model. This involves superpixel segmentation and multi-scale (global and local saliency) analysis to highlight temperature anomalies, followed by adaptive thresholding based on Gaussian distribution fitting for automatic segmentation. Second, a fine discrimination step is introduced, which integrates multimodal prior information from point clouds to significantly reduce false alarms. This is achieved by applying a series of physical constraints: area filtering, temperature variance filtering, terrain-based filtering, and overlap analysis between the infrared and point cloud data.

Validation with field data collected during the flood season demonstrates that this method achieves high-precision localization of piping zones. Its key advantage lies in its ability to effectively suppress false positives caused by environmental clutter while ensuring that the detection results align with physical principles. This study provides a practical and reliable technical solution for enhancing the safety inspection and early warning systems of levee structures.

An Open-Source Pipeline for Runtime-Optimized Heritage Photogrammetry in Game Engines

Arkoun Merchant¹, Adam Weigert¹, Chloe Dennis², Stephen Fai¹

¹Carleton Immersive Media Studios, 1125 Colonel By Dr, Ottawa, Canada; ²Bytown Museum, Ottawa, Canada

This paper presents Mesh2Tile, an open-source pipeline that converts photogrammetric meshes into runtime-optimized 3D Tiles for interactive visualization in game engines. Photogrammetry produces high-polygon meshes that remain difficult to deliver at scale

through interactive platforms. Cloud-based conversion services like Cesium Ion provide a path to the OGC 3D Tiles format but impose cost barriers and raise data sovereignty concerns for confidential heritage projects. Existing open-source converters rely on uniform spatial partitioning, export redundant textures with every tile, and offer limited control over LOD generation. Mesh2Tile leverages Blender's Python API to perform adaptive octree tiling driven by triangle density, per-tile texture baking that eliminates texture redundancy, and parallel processing to generate georeferenced 3D Tiles from OBJ meshes. The pipeline is validated through a case study of the Bytown Museum Commissariat Building on the Rideau Canal UNESCO World Heritage Site. It is processed at three scales from 900 thousand to 90 million triangles. Results demonstrate linear scaling of processing time, up to 62% file size reduction for larger models, and successful runtime streaming in Unreal Engine 5 through the Cesium for Unreal plugin at 120 FPS with comparable tile balance to Cesium Ion's commercial output. The pipeline enables institutions to maintain full control over sensitive heritage data while achieving performance suitable for interactive visualization.

Location determination of dynamic objects using a single CCTV with monocular depth estimation

JiHeon Jung¹, Junhee Youn², Jieun Kim¹, Junho Gong¹, Phillip Kim¹, Sunwoong Paik¹

¹1 Dept. of Future & Smart Construction Research, Korea Institute of Civil Engineering and Building Technology, 10223 Goyang-Si, Gyeonggi-Do, Republic of Korea; ²Corresponding Author : Dept. of Future & Smart Construction Research, Korea Institute of Civil Engineering and Building Technology

This contribution presents a method to determine ground coordinates of pedestrians from a single CCTV frame using monocular depth estimation and orthophoto-based ground control points. Urban crowd monitoring requires pedestrian location information, but many CCTV-based approaches rely on accurate camera calibration or multi-view configurations, which are often unavailable in real deployments. In this study, we exploit relative depth values from a monocular depth estimation model (Depth Anything V2) and ground control points jointly identifiable in both the CCTV frame and an orthophoto in EPSG:5186. For each frame, depth-based distance ratios between the pedestrian and ground control point pairs are used to construct Apollonius circles in the orthophoto plane, and the pedestrian position is estimated by a weighted least-squares adjustment of their intersections. The method is evaluated on 180 frames across two scenes from an urban testbed with camera–target distances within approximately 50 m, across three GCP placement scenarios. For the optimal configuration (Scenario A), a mean RMSE of 1.989 m was achieved, excluding frames in which GCPs were temporarily occluded by moving objects, demonstrating that single-frame CCTV imagery combined with an orthophoto can achieve an accuracy of approximately 2 m without any EOP/IOP information, which is practically useful for urban crowd monitoring and dynamic thematic mapping. The influence of GCP placement geometry and occlusion conditions on estimation accuracy is also analyzed

ML-MIFD: Multi-Level Multimodal Invariant Feature Descriptor

Zening Wang, Haoyu Guo, Yongxiang Yao, Yongjun Zhang, Peihao Wu, Yi Wan

School of Remote Sensing and Information Engineering, 430079, Wuhan, Hubei, China

With the rapid advancement of multi-sensor technology, cross-modal image matching has become a key research focus. However, significant challenges persist, primarily caused by differences in imaging mechanisms that lead to nonlinear radiation variations and feature heterogeneity.Coupled with complex geometric distortions, traditional feature description methods in matching struggle to directly or effectively represent common feature information across modalities, resulting in matching failures. Thus, effectively mitigating noise and radiation distortions to enable robust cross-modal matching remains an open and critical problem, compounded by the intrinsic difficulty of balancing descriptor parameters like patch size and histogram partitioning. To address the aforementioned issues, this paper proposes a novel Multi-Level Multimodal Invariant Feature Descriptor (ML-MIFD), designed to enhance resistance to nonlinear radiometric differences and multi-source noise while maintaining rotation invariance. The proposed algorithm consists of three stages: feature detection, ML-MIFD descriptor construction, and image matching.This paper conducts comparative experiments with various state-of-the-art methods using typical cross-modal image datasets. The results demonstrate that the ML-MIFD method exhibits significant advantages in both registration accuracy and matching stability.

Geomorphological Monitoring of Erosion on Restored Slopes Through the Integration of Drones, GIS, and LiDAR

Mónica López Moncada^1,2,3, Joan-Cristian Padró^1,4, Vicenç Carabassa⁵, Paulo Escandón-Panchana^6,7, Andrés Velastegui-Montoya^2,3

¹Departamento de Geografía, Universitat Autònoma de Barcelona (UAB); ²Faculty of Engineering in Earth Sciences, ESPOL Polytechnic University; ³Laboratory of Geoinformation and Remote Sensing, Faculty of Engineering in Earth Sciences, ESPOL Polytechnic University; ⁴Institut Cartogràfic i Geològic de Catalunya (ICGC), Parc de Montjuïc; ⁵CREAF, Universitat Autònoma de Barcelona (UAB); ⁶Departamento de Ingeniería Cartográfica y Topografía, Universidad Politécnica de Madrid (UPM); ⁷Escuela de Ciencias Ambientales, Universidad Espíritu Santo

Mining represents a strategic activity for economic development; however, this activity causes significant impacts on the landscape, soil, and water resources. During the restoration phase, slope erosion represents a challenge for ensuring the geomorphological stability and ecological functionality of the affected areas. This study aims to evaluate the erosion dynamics of restored mining slopes by integrating Geographic Information Systems (GIS) and data obtained from Unmanned Aerial Systems (UAS) for geomorphological monitoring and quantification of soil loss on slopes. The research was carried out at the Lázaro quarry, Tarragona, Spain, using a fixed-wing UAS equipped with a multispectral camera to generate high-resolution orthophotos and Digital Elevation Models (DEMs), and compared with historical LíDAR data. Height Difference Models (HDMs) and volumetric analysis were applied to quantify erosion and deposition processes. Three modelling approaches were compared: ridge-derived DEM (DEMp), filtered DEM (DEMf), and lidar DEM (DEMl), considering their accuracy, spatial detail, and ability to represent erosional microtopography. The findings revealed that the DEMp provides the most consistent estimates of volume loss and most faithfully reproduces pre-erosion morphologies. At the same time, the DEMf tends to smooth relief, while the DEMl provides a lower-resolution overview. These results confirm the effectiveness of integrating UAS data, photogrammetry, and geospatial analysis for monitoring restored slopes, enabling the accurate quantification of eroded volumes and the detailed characterisation of morphological processes. This study contributes to the optimisation of the geomorphological and environmental management of restored mining areas, promoting their long-term stability and sustainability.

Application of SfM Methods for the Photogrammetric Processing of Historical Aerial VHS Videos

Grzegorz Jóźków, Maurycy Hechmann

Wroclaw University of Environmental and Life Sciences, Poland

This submission presents the results and analysis of the SfM application for the processing of historical aerial VHS videos. The test data was collected during the 1997 Central European Flood and poses significant challenges due to the low quality of the data, the manner of the data acquisition (corridor mapping from different altitudes), and the object (a significant part of the images show the water). The SfM processing was executed in commercial software and allowed for successful image block bundle adjustment and creation of subsequent products, such as dense point cloud and orthomosoaics. One of the challenges during processing was the extraction of the approximate position of images and the selection of processing parameters.

Global Block Adjustment for Mosaicked Stereoscopic Satellite Imagery

Michaël Erblang¹, Emelyne Saulnier¹, Guillaume Laurent¹, Nicolas Delaygue², Fabrice Buffe², Alice Latourte², Mathilde Jassaud³, Noémie Bricout³

¹Thales Services Numériques (TSN), 290 Allée du Lac, 31670 Labège, France; ²Centre National d’Etudes Spatiales (CNES), 18 avenue E. Belin, 31400 Toulouse cedex 9, France; ³Institut national de l'information géographique et forestière (IGN), 18 avenue E. Belin, 31400 Toulouse cedex 9, France

Satellite imagery acquired over large areas from multiple viewpoints introduces subtle geometric misalignments that degrade the quality of derived products such as Digital Surface Models (DSMs). This paper presents a global block adjustment workflow designed to correct these errors across overlapping stereo acquisitions from the CO3D constellation, which captures Earth's surface at 50 cm resolution.

The proposed pipeline operates in three stages: individual acquisition refinement using Space Reference Points (SRPs) as Ground Control Points; tie point extraction between overlapping scenes through two-pass image correlation; and a weighted global spatio-triangulation simultaneously optimizing attitude biases, attitude drifts, and per-satellite magnification parameters.

Applied to a large stereo acquisition dataset over the Aorounga crater, Chad, the method demonstrates strong geometric performance. The results highlight that careful parameterization — combining observation weighting, n-tuple point filtering, and per-satellite sensor refinement — is key to producing accurate, geometrically consistent large-scale mosaics from bi-satellite stereo imagery. This paper does not include the in-orbit performances due to confidentiality agreement.

Learning-Based Semantic Segmentation and Context-based Quality Control of Bike-Pack LiDAR data for Tree Mapping in Semi-Urban Environments

Sungwoong Hyung¹, Hazem Hanafy¹, Chunxi Zhao¹, Sangyoon Park¹, Songlin Fei², Ayman Habib¹

¹Lyles School of Civil and Construction Engineering, Purdue University, West Lafayette, IN, 47907, USA; ²Department of Forestry and Natural Resources, Purdue University, West Lafayette, IN, 47907, USA

Accurate tree mapping in semi-urban areas is essential for ecological monitoring and infrastructure maintenance, but is challenged by complex structures and clutter in LiDAR data. This study proposes a learning-based framework using a Superpoint Transformer (SPT) for semantic segmentation. The model is pretrained on the KITTI-360 dataset and then fine-tuned using transfer learning on a high-resolution dataset captured by our in-house Bike-Pack LiDAR system. A key contribution of this work is a context-based quality control process applied after the initial segmentation. This quality control process refines the results by removing building artifacts, correcting misclassifications between vegetation and poles using geometric and intensity analysis, and refining building boundaries. Experiments demonstrate that this QC process significantly improves segmentation accuracy, especially for the critical vegetation and pole classes.

Multitemporal Monitoring of Posidonia Oceanica Banquettes using UAV Photogrammetry

Valeria Longhi¹, Andrea Lingua², Francesca Gallitto², Filiberto Chiabrando³

¹DIST – Interuniversity Department of Regional and Urban Studies and Planning, Politecnico di Torino, Italy; ²DIATI – Department of Environment, Land and Infrastructure Engineering, Politecnico di Torino, Italy; ³DAD – Department of Architecture and Design, Politecnico di Torino, Italy

Posidonia oceanica (PO) meadows represent one of the most valuable coastal ecosystems in the Mediterranean Sea, providing key ecological functions and ecosystem services (Vassallo et al., 2013). Even after detachment, PO leaves and rhizome fragments accumulate along the shoreline forming thick deposits known as banquettes (Rotini et al., 2020). These natural structures play a crucial role in protecting beaches from erosion, buffering wave energy, and contributing to the nutrient cycling of coastal systems (Fonseca and Cahalan, 1992).

Despite their ecological importance, banquette dynamics are not consistently monitored, standardized monitoring procedures are lacking, and their spatial and temporal variability remains poorly understood. Within the framework of the POSEIDON project, funded by the Italian National Recovery and Resilience Plan (PNRR), innovative high-resolution mapping techniques are being developed to monitor PO ecosystems both underwater and on the coast. This contribution presents a methodology based on UAV RGB photogrammetry for the multitemporal analysis of banquette morphodynamics, demonstrating its potential for quantitative assessment of seasonal and interannual changes. UAV photogrammetry has become a widely adopted tool for high-resolution coastal monitoring and topographic mapping, providing centimeter-scale DEMs when combined with RTK positioning and well-distributed ground control points (Zannutta et al., 2020; Vecchi et al., 2021; Yoo and Oh, 2016).

Photogrammetry and 3D Gaussian Splatting for Cultural Heritage. Pro Cons and Main Differences

Xinchen Li, Alessio Martino, Filiberto Chiabrando, Xiang Li

Department of Architecture and Design(DAD), Politecnico di Torino, Italy

This paper presents a comparative analysis of traditional photogrammetric methods and 3D Gaussian Splatting (3DGS) technology in the digitisation of Cultural Heritage (CH). Two representative datasets, differing in scale and image acquisition conditions, were selected to systematically evaluate the performance of both methods in terms of visual quality, geometric accuracy, computational efficiency and stability. The results indicate that 3DGS significantly outperforms traditional photogrammetry methods in terms of rendering quality and real-time visualisation capabilities, generating more realistic and immersive visual effects. However, its geometric accuracy is generally slightly lower than that of traditional methods, a difference that is particularly pronounced in small-scale datasets or under low-resolution input conditions. Among the various implementation methods, Postshot and LichtFeld Studio demonstrated higher stability and robustness, whilst the original GraphDeco method exhibited greater sensitivity to data scale and parameter settings. Photogrammetry offers reliability in high-precision geometric reconstruction, whilst 3DGS demonstrates significant potential for complementing this with a high-fidelity visual experience. The research findings try to provide practical guidance for selecting 3D reconstruction methods across different cultural heritage application scenarios.

Prediction of Understorey Vegetation using Remote Sensing in Fennoscandian Forests

Ritwika Mukhopadhyay, Ruben Valbuena, Inka Bohlin

Dept. of Forest Resource Management, Swedish University of Agriculture (SLU), 90183 Umeå, Sweden

Understorey vegetation (USV) contributes to forest structure, nutrient cycling, species diversity, habitat functions, and disturbance processes in Fennoscandian forests. It also provides non‑wood forest products such as wild berries. Mapping USV is important for understanding ecosystem functioning and its links to overstorey conditions. Although remote sensing (RS) enables large‑scale forest monitoring, its use for USV mapping remains limited because the layer is often obscured by upper‑canopy foliage. This study assesses the accuracy of USV cover prediction (i.e., the ground area covered by USV) using multiple RS data sources, identifies key predictors, and evaluates how canopy cover influences model performance. Field data were collected in 2024 from 487 plots in the Krycklan catchment. Sentinel‑2 summer and autumn imagery provided spectral reflectance, spectral indices, and grey‑level co‑occurrence matrix (GLCM) texture variables. Additional texture variables were derived from canopy height models (CHMs) generated using airborne laser scanning (ALS; 1–2 points/m²) and Pléiades tri‑stereo image matching (0.5 m; 1.5 points/m²). Beta regression and random forest regression (RFR) models were trained on 70% of plots and validated on 30%. Important predictors included seasonal red‑edge differences, greenness‑based indices, CHM texture variables, and ALS‑based canopy cover. Model performances indicated obstruction due to overstorey canopy cover remains for USV cover prediction. Beta regression with Sentinel‑2 data performed slightly better (RMSE = 21.7 m², variance explained = 5%) than RFR. However, best results occurred in low‑canopy plots (≤40%) using RFR with Sentinel‑2 and Pléiades‑derived CHM texture variables (RMSE = 14.6 m², variance explained = 32%).

Sequence-based decoupling Encoder for Well Log Interpretation

Ning Qian, Yiming Xu, Monica Sester

Institute of Cartography and Geoinformatics, Leibniz University Hannover, Germany

Well logging curves play a crucial role in oil and gas exploration and geological engineering, as they provide essential information about subsurface formations and reservoir properties. In recent years, with the growing adoption of deep learning techniques in geoscientific data analysis, well logging data have increasingly been modeled as depth-dependent sequences, enabling the application of sequential neural networks for their analysis. Among these approaches, attention mechanisms have been adopted in log interpretation tasks due to their ability to capture long-range dependencies within sequences. However, directly applying attention mechanisms without considering the intrinsic structure of logging data may introduce model redundancy and increase learning complexity, which can ultimately degrade predictive performance. To address this issue, this study proposes a Sequence-based Decoupling Encoder (SDE). The proposed encoder explicitly disentangles the interactions between logging curves and across depth, enabling the model to learn relationships along different dimensions separately, which allows more effective feature extraction and mapping into a latent space. The decoupling strategy also reduces the learning complexity of the attention mechanism and provides clearer learning objectives for the model. The proposed method is evaluated on the public dataset \textit{FORCE2020} and applied to two common well log interpretation tasks: missing log reconstruction and lithology prediction. We compare SDE against several representative sequential baselines. Experimental results demonstrate that SDE achieves superior predictive performance in both tasks.

Exploring the Potential of the Mandeye Handheld LiDAR System for Ecosystem Characterization

Cosme Hernanz-Gilbert¹, Carlos Cabo³, Álvaro Moreno-Martínez², Mónica Herrero-Huerta¹

¹Desertification Research Centre (CIDE) - CSIC, Spain; ²Image Processing Laboratory (IPL), Universitat de Valencia, Paterna, Valencia, Spain; ³Department of Mining Exploitation, University of Oviedo, Spain

Handheld LiDAR systems are emerging as a promising alternative to traditional terrestrial and airborne laser scanning for environmental research, yet their performance and applicability remain insufficiently explored. The Mandeye LiDAR device, developed between 2022 and 2024, stands out for its lightweight design, portability, integrability with other sensing platforms, and notably low cost. These characteristics make it especially attractive for ecological monitoring, enabling high-resolution structural data collection even in projects with limited resources. Despite this potential, very few studies have evaluated the device’s performance or its capacity to support ecosystem characterization.

This research presents a comprehensive review and experimental assessment of the Mandeye LiDAR system to determine its suitability for environmental applications. Field data are being collected in Mediterranean forest and riparian environments using three acquisition modes, on foot, bicycle, and kayak, to test how platform mobility and scanning geometry influence point cloud quality. The study evaluates point density, coverage, structural accuracy, and noise sensitivity while integrating ground-truth measurements and independent LiDAR references.

Preliminary findings show that the Mandeye performs robustly across diverse environments, with kayak-based acquisitions offering particularly detailed representations of the vegetation-water interface. Walking and cycling configurations provide efficient alternatives for forest structure assessment. Overall, the results demonstrate the value of handheld LiDAR as a flexible, accessible complement to conventional remote sensing methods. The project also aims to establish methodological guidelines for Mandeye deployment, contributing to the broader adoption and standardization of low-cost LiDAR tools in ecosystem monitoring.

VISTA-GS: MVS-Guided virtual view augmentation for sparse-view 3d gaussian splatting

Hongsheng Huang¹, Yaxin Li^1,4, Shengjun Tang², Siqi Du³, Mahmoud Mostafa¹, Mahmoud Adham¹, Wu Chen¹

¹Department of Land Surveying and Geo-Informatics, The Hong Kong Polytechnic University, Hong Kong, P.R. China; ²Research Institute for Smart Cities, School of Architecture and Urban Planning, Shenzhen University, Shenzhen, P.R. China; ³College of Urban and Environmental Sciences, Peking University, Beijing, P.R. China; ⁴Micro Dimension Technology Limited, Hong Kong, P.R. China

3D Gaussian Splatting (3DGS) has achieved remarkable success in novel view synthesis with dense input views. However, its performance deteriorates rapidly in sparse-view scenarios, particularly for viewpoints distant from training cameras. This degradation stems from two fundamental limitations: sparse initial point clouds from limited input views and insufficient viewing angle constraints for robust optimization.

To address these challenges, we propose VISTA-GS (Virtual Image Synthesis and Training Augmentation), a novel framework that leverages Multi-View Stereo (MVS) reconstruction for point cloud densification and generates virtual training views through alpha-blending rendering of MVS-reconstructed dense colored point clouds. Unlike existing approaches relying on generative models or learned priors, our method exploits the geometric consistency inherent in MVS point clouds to create physically-grounded virtual views. By rendering dense point clouds from strategically positioned virtual camera viewpoints, we generate additional training images that preserve accurate geometric relationships while providing crucial angular constraints, effectively regularizing 3DGS training without synthesis-induced artifacts.

Our main contributions are twofold. First, we address sparse SfM initialization by employing MVS for dense point cloud generation with adaptive depth-weighted ellipsoid scaling. Second, we introduce a rendering-based virtual view generation strategy that creates geometrically consistent training images around original viewpoints using the same alpha blending principle as 3DGS. This approach enables robust reconstruction from minimal input views (3-12 images), substantially improving novel view synthesis performance while maintaining geometric fidelity that generative approaches often compromise.

An Approach to 3D Digitisation and Segmentation of the Interior and Exterior of a complex Museum Object

Simon Albers, Thomas Luhmann, Till Sieberth

Institute for Applied Photogrammetry and Geoinformatics, Jade University of Applied Sciences, Oldenburg, Germany

The digitisation of cultural heritage objects is an important procedure to conserve, share and analyse artefacts from the past. Nowadays, it is common practice to digitise artefacts using DSLR cameras and Structure from Motion. For most objects, this is a suitable procedure, but in some cases, objects have narrow interiors which cannot be reached with common camera equipment. Our case study is a small kayak model (~ 1 x 0.1 x 0.15 m) from the 19th century with an interior that can only be documented through small openings (0.1 m radius). We developed a method using a modified webcam to safely digitise the interior of the kayak. By comparing three datasets of a test object, we describe advantages and disadvantages of the usage of integrated autofocus and colour balance of the webcam. Furthermore, we extended our approach for segmentation of 3D models to consider the interior and prepare the models for future analysis. There were no major differences between the models of the three datasets, and all of them could reduce the data gaps in the 3D model based on the DSLR images noticeably.

Three-dimensional Reconstruction and Crack Measurement of Cultural Monuments using UAV-based Photogrammetry

Wei-Che Huang¹, Wen-Cheng Liu¹, Yi-Shan Luo¹, Po-Yu Chen², Kuei-Luo Lin³

¹National United University, Taiwan; ²Shin-Mag Industrial Co., Ltd., Taiwan; ³Fullai Construction Co., Ltd., Taiwan

Three-dimensional (3D) modeling for the documentation, preservation, and management of cultural heritage is indispensable. To achieve this goal, a low-cost unmanned aerial vehicle (UAV) combined with the Structure from Motion (SfM) photogrammetric technique was utilized to build a 3D model and conduct surface crack measurements of cultural monuments. The results showed that, under simple conditions, non-specialists can easily generate accurate 3D models from UAV-acquired imagery. In this study, the statistical errors of checkpoints between 3D reconstruction and field measurements, expressed as total RMSE, ranged from 0.103 m to 0.848 m. However, the mean absolute errors of surface crack measurements between tape-based methods and 3D reconstruction ranged from 0.002 m to 0.099 m. Furthermore, UAV-SfM was applied to measure surface crack lengths on an inaccessible cultural monument. The findings demonstrated that employing the UAV-SfM photogrammetric technique for 3D reconstruction of cultural monuments is both feasible and reliable.

Towards transparent geohazard model: XAI for ground deformation susceptibility in Rhenish Coalfields, Germany

Dibakar Kamalini Ritushree^1,2, Marzieh Baes¹, Mahdi Motagh^1,2

¹GFZ Helmholtz Center for Geosciences, Germany; ²LUH Leibniz Universitat Hannover, Germany

Satellite remote sensing has become a vital tool for monitoring environmental change and supporting disaster management, offering consistent and wide-area observations of the Earth’s surface. Combined with the rapid growth of Earth observation data, machine learning (ML) enables the detection of complex spatial patterns and improves the prediction of geohazards. One significant hazard is ground deformation caused by coal mining, which threatens infrastructure, ecosystems and local communities. This study presents an interpretable ML framework that integrates multi-source geospatial datasets with eXplainable Artificial Intelligence (XAI) techniques to map deformation susceptibility in open-pit coal mining regions. Beyond achieving high predictive performance, the approach reveals the key factors controlling ground instability, including proximity to mining operations and faults, groundwater variation and topographic conditions. The results supports enhanced monitoring strategies for reducing disaster risks in mining-affected areas.

Comparative Accuracy Assessment of two Low-Cost Devices for Underwater Structure-from-Motion 3D Reconstruction

Lukas Quirin, Gunnar Lelle-Neumann, Ferdinand Maiwald

Chair of Optical 3D-Metrology, TUD Dresden University of Technology, Germany

Accurate three-dimensional (3D) documentation of underwater environments is essential for evaluating the structural integrity of submerged infrastructure such as dams, pipelines or offshore platforms, as well as for repair operations or monitoring sites affected by potential pollution hazards including underwater chemical or ammunition residues. Automatic 3D surveying plays a key role in fulfilling these tasks remotely with a spectrum of uncrewed systems, such as remotely operated (underwater) vehicles (ROV), autonomous underwater vehicles (AUV) or robots. Conventional underwater surveying methods, including high

resolution imaging sonars and laser-based techniques, often require expensive instrumentation. Advances in photogrammetry and Structure-from-Motion (SfM) techniques enable detailed

3D reconstructions from standard imagery. This study presents a comparative accuracy assessment of two imaging devices for underwater SfM-based 3D reconstruction, giving practical workflow recommendations for low-budget underwater inspection and survey tasks.

UAV Photogrammetry and Laser Pointer Targeting for High-Precision Mapping of Inaccessible Surfaces

Dobromir Filipov¹, Stefan Vlaykov²

¹UACG, Faculty of Geodesy, Sofia; ²ESO PROEKT EOOD, Sofia

Accurate georeferencing is a fundamental requirement in UAV based

photogrammetry, directly influencing the spatial

precision, reliability, and analytical value of the derived 3D

models. However, achieving high

accuracy in areas such as rockslides or steep geological

formations presents considerable challenges, primarily due to

the difficulty or danger associated with placing conventional

Ground Control Points (GCPs) on-site. This

study introduces a novel hybrid methodology that leverages

laser pointer indication and total station surveying to establish

high-precision reference points that can be safely and

effectively integrated into UAV photogrammetric workflows.

The proposed approach aims to improve the absolute and

relative accuracy of photogrammetric models without the need

for physical GCP placement in inaccessible or hazardous areas.

A mixed reality generator for real-world envirinments in real-time

Devrim Akca¹, Çağın Torkut², Gerhard Kemper³, Armin Grün⁴

¹Faculty of Engineering and Natural Sciences, Işık Üniversitesi; ²RedHorizon Technology, Inc.,; ³GGs GmbH; ⁴4DiXplorer AG

By integrating computer vision, photogrammetry, UAV technology, and Extended Reality (XR) solutions, the presented innovative Mixed-Reality (MR) photogrammetry system enables real-time 3D visualization, interaction and measurement of realworld

environments. By eliminating the need for physical presence, the system enhances safety, efficiency and accuracy in tasks like assessing structural integrity, tracking construction progress, and observing environmental changes over time. At the

core of the system is a UAV equipped with a stereo camera rig and onboard processing capabilities. Operated on-site by an operator, the UAV captures high-resolution stereo imagery, which is processed in real time through a centralized Rest API running on cloud infrastructure. Experts located anywhere in the world connect to the system using VR headsets or a webbased application, gaining immersive access to a 3D stereoscopic view with full photogrammetric measurement functionality.

The system supports multi-user collaboration, enabling synchronized analysis and data sharing across different locations. This seamless integration of hardware and software components represents a significant advancement in real-time stereoscopic visualization.

CityZen: LOD2 building reconstruction with point cloud-free model-driven approach

Mehmet Büyükdemircioğlu¹, Ibrahim Sall^1,2, Simone Rigon¹, Fabio Remondino¹

¹3D Optical Metrology (3DOM) Unit, Bruno Kessler Foundation (FBK), Trento, Italy; ²Ecole Nationale des Sciences Geographiques (ENSG), Institut National de l’Information Geographique et Forestiere (IGN), France

Accurate building footprints and 3D models are nowadays essential for a wide range of urban applications, yet the generation of Level of Detail 2 (LOD2) models remains constrained by the availability of dense 3D data such as LiDAR or image matching products. While these sources provide high geometric accuracy, they are costly to acquire and update, creating a gap between data availability and the increasing demand for city-scale 3D modelling. Recent advances in deep learning enable monocular height estimation from aerial imagery, offering a potential alternative to traditional 3D data sources. However, integrated workflows that combine image-based inference with structured 3D reconstruction are still limited. This paper presents CityZen, a point cloud-free workflow for LOD2 building reconstruction from only RGB orthophotos. The proposed approach integrates monocular height estimation (evaluating DSMNet, HTC-DC-Net and TSE-Net), roof type classification and model-driven reconstruction within a unified pipeline. Building footprints are used as geometric constraints, while learned height and semantic cues guide the generation of consistent 3D structures. The proposed framework enables scalable and practical LOD2 city modelling using widely available aerial orthophotos, reducing dependency on costly 3D data acquisition.

Fast acquisition for modelling heritage-related complex scenes based on TLS and spherical photogrammetry

Antonio Tomás Mozas-Calvache, José Luis Pérez-García, José Miguel Gómez-López, Diego Vico-García

University of Jaén, Spain

Documenting complex heritage sites, such as the QH36 Egyptian rock-cut tomb and La Lobera cave (Iberian sanctuary), often faces severe time and logistical constraints (e.g., concurrent activity, limited access). This necessitates a methodology that ensures fast data acquisition while maintaining high geometric and radiometric quality.

This study proposes a data fusion methodology combining Terrestrial Laser Scanning (TLS) and Spherical Photogrammetry (SP). TLS is prioritized for rapid, high-accuracy geometry acquisition, while SP, using a pre-calibrated 360-degree multi-camera, is utilized primarily for detailed texture mapping and supporting geometry in occluded areas.

A key element of this approach is leveraging the TLS point cloud to extract Ground Control Points (GCPs) and Checkpoints (CPs) directly, significantly reducing the need for time-consuming total station surveying and greatly improving field work efficiency.

Results demonstrate that the methodology achieves the core objective:

• Speed: Static capture time is reduced to approximately 5 minutes per station (TLS), less in the case of static spherical photographs, and even less using SP with video.

• Accuracy: Geometric registration errors given by TLS are less than 0.5 cm.

• Efficiency: Texture acquisition is improved at least 6-fold compared to conventional photogrammetry.

This validated approach offers a viable, efficient, and reliable solution for the high-quality 3D documentation of geometrically complex and time-constrained cultural heritage scenes.

Large-Field Binocular Vision Attitude Determination Method for Rocket Recovery

Yuqi Zhang, Xianglei Liu, Runjie Wang, Haibo Shi, Zhao Lu, Haiqian Wu

Beijing University of Civil Engineering and Architecture, China, People's Republic of

High-precision attitude measurement in rocket recovery is critical for reusable launch vehicles (RLVs) and aerospace sustainability, but existing technologies have key flaws. Inertial Measurement Units (IMUs) accumulate drift, misaligning control commands with actual states; high-precision gyroscopes are costly and hard to integrate; Visual-Inertial Fusion (VINS) is light-sensitive, failing in dynamic re-entry—all risking recovery failure.

To address this, a large-field binocular vision method is proposed via four stages. First, camera calibration uses Zhang’s method for intrinsic parameters (left/right reprojection errors: 0.056/0.066 px) and control-point stitching for extrinsics, solving the large-field coverage issue and achieving 33.42 mm 3D positioning error. Next, image preprocessing applies bilateral filtering for denoising, Roberts operator for edge extraction, morphological closing for contour continuity, and multi-threshold Canny fusion to suppress spurious edges, ensuring stable input. Then, total least squares fits the midline, and left/right camera plane intersection extracts the rocket’s spatial central axis, avoiding noise from point-by-point triangulation. Finally, phase correlation resolves roll ambiguity from cylindrical symmetry, and the spatial axis calculates pitch/yaw to build a Z-Y-X Tait-Bryan angle matrix for attitude determination.

Experiments on a 1:20 scale model (1 m long, 0.3 m diameter) used µs-synced high-speed cameras (6 m height, 3 m baseline). Results show roll/pitch/yaw RMSEs of 1.58°/1.54°/1.41°, with 93% mean absolute errors ≤±2°—outperforming ORB+PnP (2.11° roll RMSE), SGBM (2.50°), and Chamfer (3.00°). Ablation experiments confirm key modules’ necessity—removing line support score filtering raises roll RMSE to 1.85°—verifying robustness in dynamic re-entry.

Low-cost stereo vision and deep learning for river water level measurement

Pedro Zamboni¹, Robert Krüger¹, László Bertalan², Xabier Blanch³, Paul Hindorf¹, Anette Eltner¹

¹Dresden University of Technology, Germany; ²University of Debrecen, Hungary; ³Universitat Politécnica de Catalunya, Spain

This study presents a low-cost, non-contact stereo vision system for automated river water level monitoring, addressing the growing need for dense and scalable hydrological observation networks under increasing climate-driven flood risks. The proposed system uses paired consumer-grade cameras combined with deep learning–based image segmentation to estimate water levels without requiring physical reference markers or pre-existing 3D models.

Two processing strategies are evaluated: a standard stereo workflow and an enhanced approach incorporating semantic masking to exclude dynamic regions such as water and sky. Camera pose estimation is assessed using both global and epoch-based optimization methods. Results show that unmasked configurations provide more stable and robust camera pose estimates, while masking improves geometric accuracy but introduces temporal instability.

Water level estimates derived from stereo reconstruction demonstrate strong agreement with reference gauge data, achieving correlation coefficients between 0.70 and 0.77. Both approaches successfully capture overall hydrological trends, including flood dynamics, although accuracy decreases under high water levels and challenging imaging conditions. Masking introduces a systematic offset in absolute values but does not significantly improve correlation performance.

Research on Cloud Control photogrammetry based on Time-series Archived Aerial Photos and Its Application in Urban Governance in Beijing

Xiaokun Zhu¹, Yingchun Tao¹, Huimin Tian¹, Mingce Xu², Yutao Guo¹

¹Beijing Institute of Surveying and Mapping, China, People's Republic of; ²Beijing SmartSpatio Technology, China, People's Republic of

This study applies cloud control photogrammetry to time-series archived aerial photos to support urban governance in Beijing. Addressing challenges such as missing ground control points, heterogeneous coordinate references, and non-digitized aerial triangulation results, the proposed method leverages existing basic geographic products (e.g., DOM, DEM) as dense control sources, enabling automated aerial triangulation and 3D reconstruction without field control points. The workflow includes control source selection and organization, image preprocessing, cloud control point and tie point matching, block adjustment, and time-series product generation. Three experimental applications are presented: (1) reconstruction of river course changes in the Beijing Municipal Administrative Center using KH satellite images (1961–1974) and 1996 DOM, yielding time-series DOM products meeting 1:50,000 scale accuracy; (2) detection of illegal self-built building additions via DSM differencing from ADS80 images (2016–2017), identifying one-to-three-story structures; (3) 3D real-scene modeling of the Grand Canal’s Tonghui River section from 1975 film photos and 2015 control data, revealing 40 years of urban transformation. Results demonstrate that cloud control photogrammetry ensures spatiotemporal consistency and enables quantifiable, multi-temporal 3D analysis for urban change detection, illegal construction monitoring, and cultural heritage preservation.

UrbanVGGT: Scalable Sidewalk Width Estimation from Street View Images

Kaizhen Tan¹, Fan Zhang²

¹Heinz College of Information Systems and Public Policy, Carnegie Mellon University, United States of America; ²Institute of Remote Sensing and Geographical Information System, Peking University, China

Sidewalk width is an important indicator of pedestrian accessibility, comfort, and network quality, yet large-scale width data remain scarce in most cities. Existing approaches typically rely on costly field surveys, high-resolution overhead imagery, or simplified geometric assumptions that limit scalability or introduce systematic error. To address this gap, we present UrbanVGGT, a measurement pipeline for estimating metric sidewalk width from a single street-view image. The method combines semantic segmentation, feed-forward 3D reconstruction, adaptive ground-plane fitting, camera-height-based scale calibration, and directional width measurement on the recovered plane. On a ground-truth benchmark from Washington, D.C., UrbanVGGT achieves a mean absolute error of 0.252 m, with 95.5% of estimates within 0.50 m of the reference width. Ablation experiments show that metric scale calibration is the most critical component, and controlled comparisons with alternative geometry backbones support the effectiveness of the overall design. As a feasibility demonstration, we further apply the pipeline to three cities and generate SV-SideWidth, a prototype sidewalk-width dataset covering 527 OpenStreetMap street segments. The results indicate that street-view imagery can support scalable generation of candidate sidewalk-width attributes, while broader cross-city validation and local ground-truth auditing remain necessary before deployment as authoritative planning data.

Pompeii. From the measurement of small indentations to the calculation of the terminal ballista.

Monil Mihirbhai Thakkar¹, Amir Ardeshiri Lordejani¹, Mario Guagliano¹, Silvia Bertacchi², Sara Gonizzi Barsanti², Adriana Rossi²

¹Department of Mechanical Engineering, Politecnico di Milano, via la Masa 1, 20156, Milan, Italy; ²Department of Engineering, Università degli Studi della Campania Luigi Vanvitelli, Via Roma 29, 81031, Aversa (CE),Italy

During Sulla’s siege of Pompeii in 89 BC, Roman artillery projectiles struck the city’s fortified walls, leaving visible impact craters. The subsequent eruption in AD 79 buried the site, preserving both its architectural layout and the damaged wall surfaces, which were later excavated in the early 20th century. By analysing the visible damage found on the fortified walls of Pompeii, reverse engineering techniques were used to decipher the engineering principles behind Roman military technology. This study simulates the impact of metal projectiles on grey tuff to estimate the impact velocities and the energy required to cause the observed damage, providing insights into the destructive capabilities of Roman weapons. It develops material models and applies finite element analysis, including mesh convergence, velocity calibration, and angular impact studies for both ballista stones and darts to better understand impact mechanics and crater formation.

metal darts on the city walls, along with the simulation of forces and trajectories. Among the objectives is to verify the calculated data against experimental relationships developed in antiquity and applied to the detection of small pyramidal indentations.

BEV-LOC: Real-Time and Lightweight Cross-View Localization via Online BEV Mapping

Jiyong Kwag, Charles Toth, Alper Yilmaz

Ohio State University, United States of America

This abstract presents a deep learning and classical computer vision framework for cross-view geolocalization using 360-degree multi-perspective view (PV) images and an offline global map. Recent studies on cross-view geolocalization typically rely on deep learning models to localize panoramic PV images by matching them with reference satellite imagery. However, such approaches face practical limitations in real-world deployments, due to their dependence on large-scale GPU resources and the need to store extensive satellite image datasets. To address these challenges, we propose BEV-LOC, a lightweight and real-time cross-view geolocalization method. BEV-LOC employs Bird’s Eye View (BEV) encoder that learns to transform 360-degree multi-PV images into a local high-definition (HD) BEV map. The localization is then performed using Intersection Over Union (IoU)-based template matching with an offline global map. Our architecture achieves real-time performance at 30 FPS without the need for high-end GPU hardware and delivers a high positioning accuracy of 1.2 meters.

Remote Pipe Diameter Measurement from a single Image using Laser Scale Projection with a Depth Compensation Model

Alice Bilbáo¹, Leonardo Galvão¹, João Andrade¹, Daniel Regner¹, Moacir Wendhausen¹, Gierri Waltrich¹, Carla Marinho², Tiago Pinto¹

¹Federal University of Santa Catarina, Brazil; ²CENPES/Petrobras, Brazil

Monitoring geometric integrity of risers and pipelines is critical in offshore oil & gas operations, where swell, collapse or torsion often manifest as diametral changes that must be detected safely and efficiently. Historically, this kind of inspection is made by industrial climb, a time-consuming, dangerous and costly operation. Increasing efforts are on remote riser inspection using drones, primarily aimed at qualitative assessment through visual analysis, as well as photogrammetry, which offers accurate inspection but requires many images, image acquisition network design and well-trained drone pilots. To overcome the limitations of a qualitative image inspection and the complexity of photogrammetry, we propose a simple, low-cost method to estimate the pipe diameter from a single image by projecting two laser points of known spacing, building a scale directly in the scene and correcting depth differences between the laser projection plane and the pipe silhouette plane. This work evaluates the proposed method in laboratory conditions for nominal and calibrated focal lengths, distances from 2 m to 10 m and four pipe diameters, demonstrating the improvement of remote pipe diameter measurement by modelling and compensating for this depth difference. The improvement becomes more evident for longer focal lengths, shorter distances, and larger pipe diameters. It has an important effect in minimizing errors, e.g., from 3.5% to less than 0.2% at a 2 m distance for a 165 mm diameter pipe. The next steps include the construction of a lightweight projector to be integrated into a drone camera gimbal.

Evaluating the synergy of hand-crafted and AI-driven feature matching in structure-from-motion 3D reconstruction

Min-Lung Cheng, Yasutaka Kuramoto

SkymatiX Inc., Japan

This study evaluates the effects of hand-crafted and AI-driven feature extraction and matching approaches on 3D scene reconstruction. While hand-crafted methods remain widely adopted in structure-from-motion (SfM), their performance often deteriorates when repetitive or uniform textures occur across multiple images, leading to alignment failures and incomplete reconstructions due to insufficient or erroneous feature correspondences. Recent advances in artificial intelligence have introduced robust pipelines capable of addressing these challenges by improving feature detection and matching in texture-repetitive imagery. In this study, hand-crafted and AI-driven feature extraction and matching techniques are integrated and assessed on challenging datasets to examine their performance in SfM-based 3D reconstruction. Experimental results demonstrate that combining hand-crafted feature points with AI-driven matching significantly enhances the robustness and reconstruction success rate across diverse challenging scenarios. This hybrid approach offers a promising alternative for reliable SfM 3D reconstruction when dealing with images dominated by repetitive or uniform textures.

The Emerging Role of Vision-Language Models in the Automation of Railway Asset Management: A Review and Future Perspective

Ashley Varghese, Mohammadjavad Ghorbanalivakili, Gunho Sohn

York University, Canada

Automated railway inspection is critical for safety, but current deep learning models are limited by a "closed-world" assumption, failing to identify novel or rare assets without costly retraining. This review explores a transformative solution: Vision-Language Models (VLMs). We introduce the concept of "reasoning-powered detection," where a model’s linguistic intelligence is used to guide the identification process.

Multi-Modal LoD2 Building Reconstruction Benchmark for Urban Modeling

Mohammad Moein Sheikholeslami¹, Youssef Korny¹, Andreas Wichmann², Ksenia Bittner³, Gunho Sohn¹

¹York University, Canada; ²Jade University of Applied Sciences, Germany; ³German Aerospace Center (DLR), Weßling, Germany

Accurate 3D building modeling at level of detail 2 (LoD2) is

fundamental for urban analysis, supporting applications such

as realistic city simulations, energy assessment, and infrastructure

planning. While cadastral data is often freely accessible in

many developed countries, existing publicly available 3D building

benchmarks are typically limited either in scale or in the

diversity of input modalities required for developing and evaluating

modern deep learning methods.

We present a new large-scale, open, instance-wise dataset for

LoD2 building modeling from aerial imagery and LiDAR.

Through rigorous processing and validation, it bridges the

gap between raw open geospatial data and structured research

benchmarks. Its modular design supports both single- and

multi-modal reconstruction workflows. The upcoming public

release aims to enable reproducible research in 3D urban modeling,

cross-modal learning, and digital-twin creation, advancing

automated, reliable city-scale 3D reconstruction.

GeoRGMAE: Geospatially Guided Masked Autoencoders for Building Segmentation

Tugba Eraslanoglu¹, Guneet Mutreja², Martin Kada¹, Ksenia Bittner²

¹Technical University of Berlin, Germany; ²German Aerospace Center (DLR)

Accurate building segmentation from high-resolution aerial imagery is essential for various urban applications such as digital twins, geographic information system, and flood risk modelling. However, conventional supervised deep learning approaches require large amounts of pixel-level annotations, which are costly and time-consuming to obtain for large remote sensing datasets. To address this limitation, self-supervised learning has recently emerged as an effective paradigm in order to learn visual representations from unlabeled data. In particular, masked autoencoders (MAE) have demonstrated strong performance by reconstructing masked image patches during pretraining. Nevertheless, conventional MAE frameworks rely on random masking strategies that do not consider the spatial structure and semantic importance of regions in high-resolution remote sensing imagery. In this study, we propose GeoRGMAE, a geospatially guided masked autoencoder for building segmentation. Unlike standard MAE, which rely on random masking, our approach leverages building footprint annotations available in the pretraining dataset to guide the masking process while preserving the original reconstruction objective. We introduce three masking strategies -core, balanced, and density-aware masking- that prioritize semantically relevant building regions under the varying urban densities. The core strategy focuses on building interiors, the balanced strategy distributes masking between buildings and background, and the density-aware adapts masking based on scene-level building density. Experiments on the Roof3D and WHU Building datasets demonstrate consistent, though modest, improvements over standard MAE pretraining, with the most effective masking strategy depending on dataset characteristics. These results indicate that incorporating geospatial priors into masked image modelling can improve representation learning for downstream building segmentation tasks.

Deep Learning-based Roof Detection from UAV Dense Point Cloud for Solar Panels Mapping

Aleksandra Sekrecka, Damian Wierzbicki, Kinga Karwowska, Agnieszka Myrcik

Military University of Technology in Warsaw, Poland, Poland

Photovoltaic panels are becoming increasingly popular, and finding a suitable location for them quickly and automatically is a current and practical problem. In our experiment, we test whether a point cloud from dense multi-image matching can be useful for the automatic detection of the best locations for installing photovoltaic panels. We propose a methodology for processing and analyzing UAV point clouds, where the use of deep learning in combination with the CANUPO algorithm results in high roof recognition efficiency.Two classes were selected: roofs and non-roof objects. This made it possible to filter the detected roofs and remove erroneous objects. The resulting model detected buildings with an accuracy of approximately 80% and an effectiveness of 100% (there were no false detections). the following factors were taken into account in the insolation calculations: roof angles, roof slope exposure, changes in the angle of sunlight throughout the year, and atmospheric transmittance. The roof angles and exposure were determined using a Digital Surface Model (DSM) generated from multi-image UAV data. In our research, we took into account the average angle of incidence of sunlight throughout the year and at quarterly intervals.The use of DSM for roofs and the SVC algorithm combined with CANUPO made it possible to eliminate false detections and significantly increase the effectiveness of location detection. Research conducted for the entire year and quarters enabled the analysis of changes in roof insolation throughout the year, which is crucial when estimating the profitability of installing photovoltaic panels.

Comparison of Different Object Detection Methods for Automatic Facade Enrichment of Existing Building Modells from Arial Images

Johannes Otepka¹, Günter Sükar², Martin Kerschner², Gerald Forkert², Norbert Pfeifer¹

¹TU Wien, Austria; ²UVM Systems GmbH, Wien, Austria

This study investigates the enrichment of existing building models using deep learning-based window detection from oblique aerial imagery acquired by a high-end multi-camera sensor system. While many cities maintain LOD2 building models at Level of Detail 2, higher levels of detail require the integration of facade elements such as windows. Three detection strategies are evaluated using 3D reference building models to assess accuracy and completeness. The test site is located in Vienna and consists of multiple large residential buildings with varying facade characteristics.

The evaluated methods include zero-shot object detection with Grounding DINO combined with Segment Anything Model 2, applied to both oblique images and facade orthophotos, as well as a SAM2-UNeXT network requiring minimal training. Results indicate that zero-shot detection on orthophotos achieves the best performance, with a precision of 0.95 and an F1 score of 0.85. In contrast, the SAM2-UNeXT approach shows lower precision and F1 scores but slightly higher recall.

The investigation shows that detection performance is influenced by facade viewing angles. Steeper viewing angles generally improve detection quality but increase susceptibility to occlusions, particularly in dense urban environments. The article concludes with a detailed outlook on future work, including the extension of the approach to more complex three-dimensional building structures.

Quality Restoration of Point-Cloud-Derived 2D Projections: A Comparative Study of Void-Filling Techniques

Md Rakibul Islam Chowdhury¹, Sang Hyeok Han^1,2, Jong Won Ma³

¹Dept of Building, Civil and Environmental Engineering, Concordia University, Montréal, QC, Canada; ²Centre for Innovation in Construction and Infrastructure Engineering and Management (CICIEM), Gina Cody School of Engineering and Computer Science, Concordia University, Montréal, QC, Canada; ³School of Civil and Environmental Engineering, Yonsei University, Seoul, South Korea

Point-cloud-derived 2D projections enable generating unlimited virtual views for indoor scene analysis and dataset creation. However, projecting irregular 3D samples onto a dense image grid commonly produces void pixels due to sparsity, occlusions, and incomplete scan coverage. These projection-induced artifacts degrade the visual fidelity of rendered images and limit their usefulness in downstream image-based workflows. This study investigates void-filling strategies tailored to point-cloud-generated RGB projections and provides a comparative evaluation of three representative approaches: (i) K-nearest neighbor (KNN) interpolation with KD-Tree accelerated neighbor search, (ii) a rule-based neighborhood method (NNRule) that adapts filling behavior using local variability to preserve edges, and (iii) a mask-normalized Gaussian-weighted propagation method that diffuses valid color information into void regions. Experiments were conducted on multi-view perspective projections generated from Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS) Area 3, totalling 5,520 images. Restoration quality was assessed using standard pixel-level metrics such as MAE, RMSE, PSNR, and SSIM. Quantitative results show that Gaussian-weighted propagation achieved the best overall performance, followed by NNRule, while KNN performed weakest numerically. Qualitative comparisons further indicate that KNN produces the most visually realistic texture appearance, whereas diffusion-based filling is softened fine details. Finally, the study establishes a practical baseline that enables both academic researchers to advance point-cloud-to-image restoration without relying on paired RGB datasets and industrial practitioners to deploy light weight void-filling pipelines in real-world applications such as digital twins, indoor robotics, facility management, and augmented reality.

Bridging the Gap: Improving handheld Laser Scanning Point Cloud Quality in Forests via RTK-GNSS integrated SLAM

Carolin Rünger, Stefan Binapfl, Sophia Böhme, Ferdinand Maiwald, Anette Eltner

Technical University Dresden, Germany

Accurate forest inventories are essential for sustainable forest management. Handheld personal laser scanning (H-PLS) enables efficient and flexible forest data acquisition. However, ensuring reliable point cloud quality in complex environments remains challenging. While Simultaneous Localization and Mapping (SLAM)-based H-PLS allows rapid data collection, trajectory drift and accumulated registration errors can reduce the accuracy of derived tree parameters and structural metrics. In contrast, Global Navigation Satellite System (GNSS)-based Real-Time Kinematic (RTK) positioning provides centimetre-level absolute accuracy and drift-free trajectories, although its application in forested environments is still emerging. This study evaluates the impact of RTK-GNSS integration on point cloud geometry compared to SLAM-based point clouds without GNSS across two Central European forest plots with contrasting canopy structures. Analyses focused on tree parameter accuracy, structural metrics based on quantitative structural models, point density and noise characteristics. To isolate the effect of GNSS integration, data from the RTK-GNSS enabled H-PLS device were additionally processed without GNSS information, and an open-trajectory scan without loop closure was included for comparison. Results show that RTK-GNSS improves point cloud consistency and especially enhances the estimation of volume- and branch-related metrics. In the dense canopy plot, RTK-GNSS information reduced mean errors in branch number (−6100 to −5369) and crown volume (−492.75 to −357.21 m³). However, overall performance in tree parameter estimation depends on point density. These findings highlight RTK-GNSS H-PLS as a promising approach for flexible and efficient forest data acquisition in inventory applications.

Semantically-Driven Adaptive Registration for Correcting Non-Constant Drift in Multi-Temporal MLS Data

Aimad El Issaoui^1,2, Veikka Taka¹, Harri Kaartinen¹, Antero Kukko^1,2, Juha Hyyppä^1,2

¹Finnish Geospatial Research Institute (FGI), the National Land Survay of Finland; ²Aalto University, School of Engineering, Department of Built Environment

Mobile Laser Scanning (MLS) provides high-accuracy 3D point clouds essential for road infrastructure monitoring. However, multi-temporal MLS analysis is often limited by non-constant, spatially varying trajectory drift caused by GNSS outages and IMU inaccuracies. These misalignments can exceed the magnitude of the changes being monitored, such as pavement deformation, making accurate change detection challenging. This paper presents a fully automatic, semantically driven registration pipeline designed to correct spatially varying drift in directly georeferenced MLS data. The method first applies Principal Component Analysis (PCA) and intensity-based filtering to classify points into stable geometric categories, including flat horizontal surfaces, flat vertical structures, and linear vertical features. A correspondence-based filtering step removes dynamic objects and temporal changes to ensure that registration is driven by stable geometry. The core of the method is an adaptive piecewise registration strategy, where the reference point cloud is divided into sequential 1-meter patches. Each patch is assigned a local rigid transformation estimated using an adaptively expanding registration window guided by the availability of stable vertical features. A final smoothing step ensures spatial continuity between adjacent transformations. The method was evaluated on two MLS datasets collected one year apart along a 3 km road corridor using the FGI Roamer-R4DW system. Validation using 30 independent ground signals showed that the 3D RMSE improved from 3.38 cm to 1.54 cm, with vertical RMSE improving from 2.54 cm to 0.67 cm. The results demonstrate that the proposed approach enables centimeter-level alignment suitable for high-precision multi-temporal road monitoring and change detection applications.

3D Meshing of Challenging Surfaces using Gaussian Splatting

Dario Billi^1,2, Chaimaa Delasse^2,3, Arnadi Murtiyoso², Hélène Macher², Pierre Grussenmeyer², Gabriella Caroti¹, Andrea Piemonte¹

¹Department of Civil and Industrial Engineering, ASTRO Laboratory, University of Pisa, Largo Lucio Lazzarino, 56122 Pisa, Italy; ²Université de Strasbourg, INSA Strasbourg, CNRS, Laboratoire ICube UMR 7357, 67000 Strasbourg, France; ³Ecole des Sciences Géomatiques et de l’Ingénierie Topographique, Institut Agronomique et Vétérinaire Hassan II, Madinat Al Irfane, 6202 Rabat, Morocco

This work addresses the challenge of accurate 3D reconstruction of complex scenes such as vegetation, transparent, or non-Lambertian surfaces, which often cause difficulties for traditional Multi-View Stereo (MVS) methods. This issue is particularly relevant in the field of Cultural Heritage (CH), where many objects and environments exhibit such characteristics. To overcome these limitations, the study proposes the use of the new MILo (Mesh-In-the-Loop Gaussian Splatting) approach (Guédon et al., 2025), comparing its results with conventional MVS techniques and Terrestrial Laser Scanner (TLS) data.

MILo builds upon the 3D Gaussian Splatting (3DGS) technique, introducing a differentiable mesh extraction during optimization of the Gaussian parameters. This enables gradient flow between the volumetric and surface representations, resulting in more accurate and lightweight meshes, suitable for downstream applications such as simulations or animations.

The study uses three datasets: a Tilia tomentosa tree (Strasbourg) for complex natural geometries, the winter garden of the Sarreguemines Museum for reflective surfaces, and woodcarvings from Kasepuhan Palace (Indonesia) for fine ornamental details. Preliminary results on the tree dataset show that MILo significantly improves reconstruction quality, preserving thin structures such as branches and leaves compared to traditional MVS methods.

The final analysis will include both qualitative and quantitative comparisons (RMSE, standard deviation, completeness, mesh complexity) against TLS data, to rigorously assess MILo’s performance across different geometric and material conditions.

Render-to-Real Image-Based Change Detection of Outdoor Infrastructure Using 3D Gaussian Splatting

Satoko Hattori-Nagao, Kazuo Oda, Tomoaki Eguchi, Takanobu Nagao, Satomi Kakuta

Asia Air Survey Co., Ltd., Japan

This study proposes a framework for detecting changes in outdoor civil infrastructure using bi-temporal images and validates its effectiveness through experiments on real-world datasets. The proposed method performs change detection by comparing a 3D Gaussian Splatting (3DGS) model reconstructed from multi-view images acquired before changes occur with a single real image captured from a new observation viewpoint after changes. The processing pipeline consists of: (1) construction of the 3DGS model, (2) generation of an initial rendered image corresponding to the post-change real image, (3) feature matching between the rendered image and the real image followed by camera pose estimation, and (4) change detection. Experiments conducted on a sediment control dam and a bridge dataset demonstrate that the proposed method achieves a maximum Intersection over Union (IoU) of 0.82 for change detection. Furthermore, compared to a baseline method based on bi-temporal real image pairs, the proposed method improves IoU by up to 24 percentage points. The results also indicate that even under limited acquisition conditions after changes, accurate change detection can be achieved when the 3DGS reconstruction quality and pose estimation are sufficiently reliable.

Empirical assessment of geometric accuracy of underwater lidar in tropical shallow waters

Mentari Khoerunnisa Azzahra¹, Fickrie Muhammad², Arnadi Murtiyoso³, Annette Scheider⁴, Harald Sternberg⁴, Gabriella Alodia²

¹Institut Teknologi Bandung, Faculty of Earth Sciences and Technology, Geodesy and Geomatics Engineering Postgraduate Programme, Bandung, Indonesia; ²Institut Teknologi Bandung, Faculty of Earth Sciences and Technology, Hydrography Research Group, Bandung, Indonesia; ³Université de Strasbourg, CNRS, INSA Strasbourg, ICube Laboratory UMR 7357, Photogrammetry and Geomatics Group, Strasbourg, France; ⁴HafenCity University Hamburg, Department of Hydrography and Geodesy, Hamburg, Germany

Light detection and ranging or lidar technology has been widely applied across various spatial domains. To meet the needs for a detailed underwater survey, Fraunhofer IPM developed an underwater lidar, known as ULi. The system has been tested under controlled laboratory conditions. Nevertheless, Fraunhofer IPM claims sub-millimetre range precision in clean water. However, no empirical study has managed to address this aspect, as fieldwork in the Elbe River (Walter et al., 2025) did not manage to obtain suitable data due to its naturally high turbidity. The present study will evaluate the geometrical accuracy of ULi against terrestrial laser scanner (TLS) and photogrammetry. An acoustic Doppler current profiler (ADCP) was chosen as a measurement target on the field experiment due to its rigidity and high reflectivity, with the dimensions of the frame is 75 × 75 × 65 cm. The data sets were georeferenced to the WGS 84/UTM Zone 48S coordinate system using control point targets affixed to the ADCP frame and measured with a total station applying the intersection method. Subsequently, the geometric accuracy assessment was performed through statistical evaluations, including root mean square error analysis and 3D point cloud deviation comparison among ULi, TLS, and photogrammetry data sets. The 3D model derived from the ULi data will be assessed against models derived from TLS and photogrammetry through statistical analyses of length discrepancies and spatial deviations. Additionally, intensity, point density, linearity, planarity, and scattering analyses will be performed to evaluate how well the point cloud represents the geometric characteristics.

Experimental Validation of Human-Readable Coded Targets for Cross-Platform Photogrammetry and 3D Laser Scanning

Miglena Raykovska^1,5, Milen Borisov², Stanislav Harizanov¹, Lyubka Pashova³, Nikolay Petkov¹, Kristen Jones⁴, Pavel Georgiev¹, Georgi Vasilev¹, Ivan Lirkov¹

¹Institute of Information and Communication Technologies, Bulgarian Academy of Sciences; ²Institute of Mathematics and Informatics, Bulgarian Academy of Sciences; ³National Institute of Geophysics, Geodesy and Geography, Bulgarian Academy of Sciences; ⁴Queens University, Canada; ⁵Centre of Excellence in Informatics and Information and Communication Technologies

Coded targets are widely used in close-range photogrammetry and 3D laser scanning for automated referencing and registration. However, most fiducial systems are optimized for specific software environments, limiting interoperability across processing pipelines. This study presents a cross-platform coded target framework for multi-sensor 3D acquisition that combines geometric redundancy, binary encoding, and human-readable elements to enhance robustness and reproducibility. An open-source implementation (PGT-Toolkit) supports marker generation, detection, and standardized coordinate export. Performance was evaluated using a controlled laboratory framework with systematically varied viewing angles, distances, and illumination conditions. Experiments were conducted using DSLR-based photogrammetry and terrestrial laser scanning. Detection rate, centroid repeatability, reprojection error, and cross-platform coordinate consistency were assessed and compared with those of established fiducial systems. Results demonstrate stable detection under oblique viewing geometries and consistent coordinate estimation across both commercial and open-source software environments. Laboratory studies confirm that Human Readable Coded Targets (HRCT) provide reliable, accurate, and cross-platform compatibility for both photogrammetric and 3D laser scanning workflows, which remain to be verified by field studies. The proposed framework contributes a structured methodology for experimental validation of interoperable coded targets in multi-sensor 3D workflows.

Integrating Multi-View Stereo and Depth Foundation Models for Precise 3D Reconstruction of Thin Urban Structures

Hwiyoung Kim¹, Impyeong Lee², Kyoungah Choi³

¹Geospatial Team, InnoPAM, Korea, Republic of (South Korea); ²Dept. of Geoinformatics, University of Seoul, Korea, Republic of (South Korea); ³Geospatially Enabled Society Research Division, Korea Research Institute for Human Settlements, Korea, Republic of (South Korea)

Constructing high-fidelity 3D models for urban Digital Twins is challenging, particularly for thin, texture-less structures like power lines where traditional Multi-View Stereo (MVS) fails due to matching ambiguities. While recent Monocular Depth Foundation Models offer dense estimation, they lack absolute scale and often degrade when applied to large-scale aerial imagery. This paper proposes a hybrid depth estimation pipeline that synergizes the metric accuracy of MVS with the structural coherence of foundation models.

Our method follows a Coarse-to-Fine strategy. First, we generate a scale-aware initial depth map by injecting sparse MVS points into the "Depth Anything" model as geometric priors, compensating for the lack of absolute scale in monocular estimation. Subsequently, a structure-guided refinement stage employs edge-based contour grouping to rectify object boundaries and suppress noise. Experimental results demonstrate that our approach successfully reconstructs power lines as distinct, linear objects with absolute scale, effectively resolving the data voids inherent in MVS and the geometric distortions typical of monocular models. This research provides a robust workflow for enhancing the precision of urban 3D reconstruction.

Estimation of refraction in photogrammetry from airborne data in an alpine environment

Myrta Maria Macelloni, Nives Grasso, Alberto Cina

Politecnico di Torino, Italy

Valpelline is an unspoilt Alpine valley located in the northernmost part of the Aosta Valley, on the border between Italy and Switzerland. It is the region’s longest valley, shaped by glaciers and rivers, with elevations ranging from about 900 m to over 4000 m at peaks such as Mont Gelé (3518 m) and Dent d’Hérens (4171 m).

Since 2020, the glaciers have been monitored by the GlacierLAB group (Politecnico di Torino) and ARPA Valle d’Aosta. Because of the valley’s steep, inaccessible terrain, biannual aerial photogrammetric surveys with a GNSS antenna, a low-accuracy IMU, and a PhaseOne iXM-RS150F camera (151 MP, 50 mm lens).

Due to a lack of synchronization between the camera and GNSS, Ground Control Points (GCPs) are needed for georeferencing. However, their configuration is often insufficient. Camera calibration certificates (2019, 2022) are crucial to correct image distortions; when unavailable, calibration is estimated using Agisoft Metashape and Structure-from-Motion methods, dividing known points into GCPs and Control Points to evaluate residuals.

High-altitude flights require correction for atmospheric refraction, which affects image geometry independently of optical distortion. Tests were carried out to estimate refraction errors (via Saastamoinen formulas) and to separate them from optical effects, enabling more accurate 3D models of Valpelline’s complex alpine environment.

Learning-based Estimation of Surface Normals in Unstructured Airborne LiDAR Point Clouds

Max Hermann^1,2, Martin Weinmann²

¹Fraunhofer IOSB, Karlsruhe, Germany; ²Institute of Photogrammetry and Remote Sensing, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

To produce suitable 3D models for downstream tasks, point clouds are often triangulated to reconstruct a triangle mesh, which first requires estimating normal vectors that define the local surface orientation. Because normals are not directly measured during laser scanning, they are often estimated in postprocessing using two steps: (1) selecting a neighborhood around each point and fitting a local surface function, and (2) orienting the resulting normal to distinguish inside from outside. Larger local neighborhoods often yield more consistent normals by averaging the surface, but can smooth out sharp discontinuities.

For orientation, various methods attempt to estimate the inside versus outside direction. In watertight scans, orientation can be determined by locally triangulating the points and propagating consistent normal orientations along the connected triangles. For surface scans containing holes and occlusions, typical for airborne LiDAR, this is more challenging, and heuristics like Minimum-Spanning-Trees or global flips towards one major coordinate axis are often used.

We propose a learning-based approach to estimate surface normals in unordered point clouds from airborne LiDAR scanning. Across multiple datasets, our approach consistently reduces artifacts and improves the quality of reconstructed triangle meshes compared to baseline methods, while achieving significantly faster runtime

Railway parameter extraction with high-precision UAV-photogrammetry: a feasibility study

Lucas De Burggrave¹, Pierre Prévost¹, Erkki Bartczak¹, Suzanna Cuypers¹, Jens Derdaele², Maarten Bassier¹

¹KU Leuven, Belgium; ²TUC RAIL, Brussels

This study investigates the feasibility of using UAV-based photogrammetry for the accurate extraction of railway geometry parameters such as gauge, alignment, and cant. The research explores whether aerial image-based reconstruction can meet the high precision requirements traditionally achieved through terrestrial survey methods. A series of experimental flights were carried out to evaluate how flight configuration, image quality, and processing strategy influence measurement accuracy and reliability. The results provide insight into the potential and current limitations of UAV photogrammetry for rail infrastructure documentation and quality control. Overall, the study contributes to advancing automated, efficient, and safe methods for railway inspection and geometric parameter extraction.

Sand Engine Beach State Assessment by applying Machine Learning on massive ARGUS Imagery

Alex De Jong, Roderik Lindenbergh, Sander Vos, Daan Hulskemper

Delft University of Technology, Netherlands, The

Dynamic beach locations world-wide are monitored by so-called Argus camera systems. Their automatic image capturing results in large databases of coastal images acquired during different illumination conditions. We present a lightweight and efficient method to automatically extract meaningful sand and supporting classes from ∼ 1 million Argus images of the Sand Engine, The Netherlands, a nature-based solution for beach erosion of 2 by 1 km. The method consists of 2 neural networks. First, a ResNet18 model selects images of sufficient quality. The second network, a shallow multi-layered perceptron is fed by RGB, intensity and texture features and classifies pixels into 6 classes, Water, Foam and Vegetation on one hand, and Aeolian, Wet and Armoured Sand on the other hand. Initial results shows good agreement with human interpretation. Final results will be used to assess the multi-year morpho-dynamic evolution at the hour scale of the Sand Engine.

Pixel-based vegetation mapping at class-level from UAV multispectral imagery: application in an alpine lake ecosystem

Mohammad Elahi¹, Alessandra Spadaro², Francesca Matrone², Andrea Maria Lingua², Chiara Graziani², Vittorio Fra¹

¹Interuniversity Department of Regional and Urban Studies and Planning (DIST), Politecnico di Torino, Corso Duca degli Abruzzi 24, Torino; ²Department of Environment, Land and Infrastructure Engineering (DIATI), Politecnico di Torino, Corso Duca degli Abruzzi 24, Torino

Vegetation mapping in alpine environments is essential for monitoring ecosystem dynamics and climate change impacts, yet remains challenging when using very high-resolution UAV imagery under limited labeled data. This study proposes a data centric, pixel-based classification framework for species-level vegetation mapping using multispectral UAV data acquired in an alpine study area. The approach prioritizes improving data representation rather than increasing model complexity. To address label scarcity, a feature-rich dataset was constructed by integrating spectral information, vegetation indices, and lightweight spatial descriptors to enhance class separability. Classification was performed using XGBoost, which is well suited for multispectral tabular data and robust under imbalanced conditions. The results show consistent classification performance across vegetation types and demonstrate the effectiveness of dataset enrichment under limited supervision, highlighting the importance of feature representation in data-scarce scenarios.

A Lightweight CNN–Mamba Hybrid Architecture for Efficient Crack Segmentation

Masaya Shimasaki, Mitsuteru Sakamoto, Toshiaki Satoh

PASCO Corporation, Japan

Pavement crack segmentation is an important task in road infrastructure inspection. However, the practical deployment of deep learning-based methods remains challenging because many high-performance models require substantial computational resources. This limitation is particularly critical in large-scale Mobile Mapping System (MMS)-based workflows, where large volumes of road surface imagery must be processed efficiently. In this study, a lightweight CNN–Mamba hybrid architecture is proposed for efficient crack segmentation as a deployment-oriented redesign of CT-CrackSeg. The proposed model replaces the original MobileViT-based global modelling modules with EfficientViM-inspired blocks based on hidden-state mixer-based state space duality (HSM-SSD), while preserving the overall encoder–decoder structure. In addition, the boundary enhancement branch is refined by introducing DCNv2-based deformable convolution. Experiments were conducted on the publicly available GAPs384 and CamCrack789 datasets. The results show that the proposed model maintains competitive topology-aware segmentation performance while substantially improving computational efficiency. Compared with CT-CrackSeg, the proposed method improves inference speed from 1.49 to 4.44 FPS on GAPs384 and from 1.31 to 3.92 FPS on CamCrack789. At the same time, peak memory consumption is reduced from 2827 MB to 355 MB, while the clDice score remains comparable, changing from 0.760 to 0.758 on GAPs384 and from 0.921 to 0.922 on CamCrack789. These results indicate that the proposed architecture provides a favourable balance between segmentation quality and deployment efficiency, and is suitable for large-scale pavement inspection and related photogrammetric infrastructure monitoring applications.

A Multi-Sensor and Multi-Temporal Approach to 3D Documentation of Historic Gardens: A Case Study of Villa Burba, Italy

Fangming Li¹, Cristiana Achille¹, Raffaella Laviscio², Luca Perfetti³, Francesco Fassi¹

¹3D Survey Group, ABC Lab, Department of Architecture, Built Environment and Construction Engineering (DABC), Politecnico di Milano, Via Ponzio 31, 20133 Milano, Italy; ²PaRID, ABC Lab, Department of Architecture, Built Environment and Construction Engineering (DABC), Politecnico di Milano, Via Ponzio 31, 20133 Milano, Italy; ³DICATAM, Civil Engineering, Architecture, Territory, Environmental and Mathematics, Università degli Studi di Brescia, Italy

Historic gardens are dynamic Cultural Heritage, shaped by seasonal cycles, vegetation growth, and continual maintenance, and require documentation methods capable of capturing change over time. This study presents a multi-sensor, multi-temporal workflow applied to Villa Burba, a seventeenth-century garden near Milan, Italy. Two surveys conducted in 2023 (leaf-on) and 2025 (leaf-off) combined UAV photogrammetry with mobile laser scanning (MLS) to maximize completeness under contrasting environmental conditions. Both datasets were processed independently, harmonized within WGS84 / UTM Zone 32N, and evaluated through point density analysis, deviation modelling, MLS loop-closure checks, and GCP residual evaluation.

Multi-temporal point clouds were analyzed in QGIS using PDAL-enabled tools. Cloud-to-cloud differencing and canopy height modelling revealed key transformations, including the drying of a water channel, the loss of a historic tree, and spatial shifts in vegetation structure. These digital findings were confirmed through field inspection. The workflow demonstrates a practical approach for monitoring dynamic heritage gardens and supporting long-term conservation and management through accurate, repeatable 3D survey data.

Affine Invariant OpenCV Descriptors and the Effects on Aerial Photgrammetry

Evan Okeeffe², Debra Laefer¹, Eleni Mangina²

¹New York University, United States of America; ²University College Dublin

Robust feature descriptors are necessary for computer vision applications such as image matching, photogrammetric three-dimensional (3D) reconstructions, and simultaneous localisation and mapping (SLAM). While most state-of-the-art feature descriptors are invariant to image transformations (such as translation, rotation, and scale) the majority lack stability in tracking points over large 3D perspective transformations. One successful method to solving these large perspective changes is by simulating affine tilts on the latitude and longitude axes of an image. These simulated tilts create greater invariance to changes in 3D perspective. To demonstrate the widespread efficacy of this approach, this paper applies affine simulation to seven state-of-the-art descriptors in OpenCV and to two of the enhanced OpenCV descriptors in OpenMVG.

Evaluating ORB-SLAM 3 Performance using a Photogrammetry-based Reference Trajectory

Leonardo Galvão, Daniel Regner, Moacir Wendhausen, Tiago Pinto, Armando Albertazzi

Federal University of Santa Catarina, Brazil

The robust evaluation of Visual Simultaneous Localization and Mapping (vSLAM) systems is fundamental to their development and deployment. However, this process is often constrained by the reliance on expensive and complex external infrastructure, such as laser trackers or motion capture systems, to provide accurate ground-truth trajectories. This paper introduces a novel and self-contained methodology for the high-fidelity evaluation of stereo vSLAM and stereo-inertial algorithms. Our approach leverages the very same image sequence used by the SLAM algorithm to generate a dense, globally optimized photogrammetric model. The proposed methodology comprises two fundamental steps, the first step consisted of validating photogrammetry as a ground truth method. For this purpose, the linear displacement measured by photogrammetry was compared with the displacement of a precision guide, which was benchmarked against a laser interferometer as the standard. Once the reference was validated, the second step assessed the performance of ORB-SLAM 3 on a free trajectory within a complex environment, by directly comparing the SLAM result to the trajectory generated by photogrammetry. The accuracy was then quantified using standard metrics, including Absolute Trajectory Error (ATE) and Relative Pose Error (RPE). The results validate our approach as an accessible, low-cost, and reliable alternative for benchmarking vSLAM systems, enabling rigorous performance analysis using only the data from the sensor suite under evaluation.

Deriving Tree Stem Profile and Volume Using a Close-Range Remote Sensing and Machine Learning Approach

Basam Dahy¹, Dag Björnberg^1,2, Shafiullah Soomro¹, Johan E. S. Fransson¹

¹Linnaeus University, Sweden; ²Softwerk AB, Sweden

Accurate estimation of tree volume is essential for precision forestry and sustainable forest management. Traditional forest inventory methods rely on manual measurements of tree height and diameter, which are time-consuming and costly to conduct over large areas, and difficult to perform efficiently in dense forest stands. This study presents a data-driven approach for estimating tree volume from partial tree stem profiles derived from high-resolution datasets. While the study relies on harvester production data (Sweden) and field-measured tree stem profiles (Brazil), the framework is designed to support the estimation of tree volume from close-range remote sensing techniques, such as terrestrial photogrammetry using handheld cameras. Three modelling approaches were evaluated, including two machine learning models (XGBoost and Random Forest) using partial tree stem profile measurements as predictors, and one baseline model (XGBoost) using diameter at breast height and tree height as predictors. The models were developed using two independent datasets: harvester production data of Norway spruce (Picea abies (L.) H. Karst.) from Sweden and field-measured tree stem profiles of Slash pine (Pinus elliottii Engelm.) and Loblolly pine (Pinus taeda L.) plantations from Brazil. The results show that tree volume can be predicted with reasonable accuracy using partial tree stem profiles, although models incorporating tree height achieved the lowest prediction errors. The findings demonstrate that partial tree stem profiles provide valuable structural information for machine learning-based tree volume estimation. This framework supports the future integration of close-range remote sensing techniques into modern forest inventory systems.

Towards Open-Vocabulary ALS Point Clouds Semantic Segmentation: An Empirical Study

Yanghong Lin^1,2, Tianyu Li^1,3, Shudong Zhou¹, Jingru Zhang¹, Li Fang¹, Wei Yao¹

¹Institute of Urban Environment, Chinese Academy of Sciences, China, People's Republic of; ²University of Chinese Academy of Sciences, China, People's Republic of; ³School of Resource and Environmental Sciences, Whuhan University, China, People's Republic of

Semantic segmentation of Airborne Laser Scanning (ALS) point clouds is critical for numerous photogrammetric and remote-sensing applications. While deep learning has become the dominant approach for ALS semantic segmentation, most existing methods rely on predefined label sets and thus lack the ability to recognize arbitrary semantic categories. With recent advances in visual foundation models (VFM), zero-shot visual understanding has achieved notable progress in natural image domains. However, the potential of adapting 2D VFMs to 3D ALS point cloud segmentation remains underexplored.

This contribution develops three VFM-based approaches for zero-shot, open-vocabulary ALS semantic segmentation: Grounding DINO+SAM, CLIP+SAM, and GSNET. Grounding DINO+SAM identifies object regions using text prompts and employs SAM to refine segmentation masks. SAM+CLIP first generates instance masks via SAM and then assigns semantic labels using CLIP text and visual embedding. GSNET integrates a remote-sensing-specific encoder with a CLIP-aligned encoder to alleviate the domain gap between natural and aerial imagery.

Empirical study conducted on the ISPRS Vaihingen dataset demonstrate that all three methods possess certain zero-shot open-vocabulary capabilities. Methods trained solely on natural images perform well on common classes (e.g., roof, tree) but struggle with rare categories such as powerline. GSNET improves performance across most categories, highlighting the importance of domain adaptation; however, rare-class segmentation remains challenging. These findings suggest that substantial domain gap and limited representation of rare classes are key obstacles to applying VFM in remote sensing. Future research should focus on test-time adaptation and unsupervised domain adaptation to enhance VFM generalization for 3D ALS point cloud.

A Workflow for the automatic Extraction of Glacier Contours from 4D Point Clouds

Steffen Isfort¹, Melanie Elias², Hans-Gerd Maas¹

¹TUD - Dresden University of Technology, Germany; ²HTWD - University of Applied Sciences Dresden, Germany

A workflow for the automatic extraction of the outlines of debris-covered glaciers and rock glaciers is presented. As the outlines in these scenarios are not clearly discernible, our approach is based on identifying geomorphological changes in multi-temporal 3D point clouds. We assume that these changes are caused by changes of the glacier. Consequently, areas with significant changes can be used to map the outline of the glacier. Our workflow uses pairs of multi-temporal 3D point clouds, which are captured for example by UAV imagery and TLS. After applying a robust registration algorithm, the difference of both point clouds is calculated. Considering only the areas that show significant changes, the glacier areas are isolated, and the outlines are mapped in a 2D mapping plane.

For evaluation, we test our workflow on two data sets. The Bøverbreen glacier, with only little debris cover, allows for a manual assessment of the glacier margins using an orthophoto mosaic from UAV imagery. A comparison of our calculated glacier margins with the manually assessed ones shows good agreement. The results confirm the basic functionality of our proposed method. However, tests show that the most challenging task is filtering glacial and non-glacial points, which is currently done solely based on the point density. More robust solutions to this problem will be discussed.

Automated detection of box-girder bridge deterioration using cylindrical projection from multi-camera 3D reconstruction and deep learning

OU Ming-Yun¹, Jhan Jyun-Ping¹, Lin Chen-Kuang³, Lin Shih-Syun¹, Tsai Hsin-Chu², Chou Tzu-Liang², Chang Chang-yu²

¹National Taiwan University of Science and Technology, Chinese Taipei; ²China Engineering Consultants, Inc., Chinese Taipei; ³Department of Mechanical and Materials Engineering, Tatung University, Taiwan

As large-scale infrastructure gradually ages, hundreds of existing bridges require regular inspections to ensure structural safety. While many researchers have proposed deterioration detection methods based on computer vision and deep learning—which can detect deterioration at the image level—no effective approach has yet been developed that integrates 3D reconstruction technology to achieve spatial localization and area quantification. To address this, this study proposes a two-part automated inspection workflow for the classification, localization, and measurement of internal deterioration in box-girder bridges. In the first part, the camera system is calibrated using an indoor calibration scene, and images are captured inside the box girder. A 3D model is constructed using Structure from Motion (SfM) algorithms, and a cylindrical projection unfolded map is generated. In the second part, a boundary-aware model—modified from DeepV3+—is used to perform pixel-level deterioration detection and classification on the unfolded map. Experimental results demonstrate that the system can generate scale-corrected cylindrical unfolded maps from 3D models with sub-millimeter scale accuracy (0.105 mm), effectively transforming complex 3D inspection tasks into measurable and analyzable 2D images. The model achieved an overall mean Intersection over Union (mIoU) of 65.11% across four categories of deterioration, representing a 7.54 percentage point improvement over the original DeepV3+. The research results validate the effectiveness of the proposed workflow in enhancing detection efficiency and objectivity for box-girder bridge maintenance.

Methodology and Practice of Hong Kong 3D Digital Map Construction Based on Multi-Source Data Fusion

Li Chen, Jun Li, Yaping Wang, Jing Wang, Weichen Yao

Shaanxi TIRAIN Science & Technology Co., Ltd., People's Republic of China

In response to Hong Kong's smart city development strategy, this paper takes the 3D digital map construction project in Kowloon as a practical case study and systematically presents a construction method -for 3D digital mapping based on multi-source data fusion. Aiming at the technical challenges in high-density urban environments—including dense buildings, complex 3D traffic networks, and severe shadow occlusion—an "air-ground fusion" data acquisition strategy is proposed. By comprehensively adopting multiple approaches such as oblique aerial photography, Vehicle Mobile Mapping System (VMMS), and Portable Mobile Mapping Survey (PMMS), a high-precision and highly realistic urban 3D model has been constructed. The paper focuses on the principles of multi-source data fusion based on feature registration and combined adjustment, as well as the 3D modeling process and the quality control methods for the final results. The project’s technical innovation and practical feasibility have been validated through international benchmarking. The research results have been applied to urban planning, traffic management, environmental studies and other fields, providing a solid data foundation and technical support for Hong Kong's smart city development.

Automatic Reconstruction of High-Accuracy 3D Roof Models from Orthophotos and Digital Surface Models

Yonghe Li¹, Masaya Shimasaki², Mitsuteru Sakamoto², Toshiaki Satoh², Tatsunori Sada¹

¹NIHON University, Chiba, Japan; ²PASCO Corporation, Tokyo, Japan

In recent years, the demand for 3D city model development has grown, as demonstrated by initiatives such as Project PLATEAU in Japan. In the construction of LoD2 building models, which are an essential component of 3D city models, the reconstruction of 3D roof models still heavily depends on manual work. To enhance productivity through automation, this study proposes a novel method for automatically reconstructing high-accuracy 3D roof models using orthophotos and Digital Surface Models (DSMs) derived from aerial imagery. In the proposed method, a deep-learning-based model is first applied to orthophotos and DSMs to extract 2D rooflines. Then, the extracted 2D rooflines are refined and polygonised to assemble 2D roof models. Finally, planar fitting was performed on the point cloud generated from the DSM within each 2D roof plane to reconstruct 3D roof models. In this process, the horizontal alignment of rooflines and the continuity between adjacent roof planes were preserved. In the experiments, 3D roof models manually digitized by stereoscopic measurement were used as the ground truth, and the automatically reconstructed 3D roof models were evaluated by comparison with this reference. As a result, the recall values for 2D and 3D roof planes were 0.686 and 0.430, respectively, and increased to 0.723 and 0.455 for roof planes larger than 4 m².

LiDAR-aided neural Scene Representation using low-cost Sensors

Mohamed Negm, Ahmed Elamin, Ahmed El-Rabbany

Toronto Metropolitan University, Canada

Neural scene representations are increasingly explored as alternatives to classical SfM and MVS in civil and architectural mapping, yet their ability to satisfy survey-grade geometric tolerances remains contested. This contribution examines how LiDAR guidance may stabilize NeRF and 3D Gaussian Splatting reconstructions of building façades obtained from low-cost cameras.

Research on Adaptive Feature Band Extraction Technology Based on Fractional Order Differentiation and Machine Learning

Fang Liu, Fei Liu, Xian Guo, Yikang Ren

Beijing university of civil engineering and architecture, China, People's Republic of

The Dunhuang murals, a significant component of China's cultural heritage, are severely threatened by salt-induced deterioration. To address the limitations of traditional invasive detection methods, this study explores a non-destructive approach using hyperspectral remote sensing to monitor mural salinity. Focusing on phosphate content, a key salt damage indicator, we propose a multi-level optimization framework that integrates Fractional Order Differentiation (FOD) for spectral enhancement and various feature selection strategies (including LASSO, SiPLS, SPA, CARS, and Random Frog) to improve prediction accuracy. Partial Least Squares Regression (PLSR) models were constructed using optimized spectral features. Results demonstrate that FOD effectively amplifies subtle spectral responses related to salinity. The model combining 1.9-order FOD spectra with LASSO feature selection achieved the highest performance, with a cross-validated R² of 0.908—a 15.96% improvement over the best model using FOD-transformed spectra alone. This study confirms that integrating FOD with advanced feature selection significantly enhances the precision and reliability of hyperspectral inversion models for mural salt damage, providing a powerful, non-destructive tool for cultural heritage conservation.

Assessing the sensibility of intervisibility on the quality of 3D geometry

Darshan Venkatarayappa, Bruno Vallet, Teng Wu

Univ Gustave Eiffel, G´eodata Paris, IGN, LASTIG, F-77454 Marne-la-Vall´ee, France

This work explores a new evaluation framework for 3D Model Quality Assessment using 3D intervisibility, a critical concept in 3D spatial analysis. In this work we will consider a high-quality LiDAR ground-truth 3D model and lower quality (dense matching and decimated) versions of it. Then we run the same intervisibility analysis on all of them and compare the results. This will allow us to evaluate the impact of geometric quality on intervisibility analysis This analysis is useful for anyone using 3D data for simulations, as it indicates what data quality they actually need to purchase or produce for their specific use case. Ultimately, the goal of this

work is to see how much the quality of the 3D model affects intervisibility results.

Neural Radiance Fields with Physically Based Reflectance for Satellite Images

Lulin Zhang¹, Ewelina Rupnik², Tri Dung Nguyen¹, Stephane Jacquemoud¹, Yann Klinger¹

¹Universite de Paris, Institut de Physique du Globe de Paris, CNRS; ²Univ. Gustave Eiffel, IGN-ENSG, LaSTIG

Recent adaptations of Neural Radiance Fields (NeRF) to remote sensing have shown strong potential for high-fidelity surface reconstruction from multi-view satellite imagery. NeRF represents a scene using multilayer perceptrons and optimizes a volumetric rendering objective to infer geometry and appearance. However, its performance declines sharply with the limited number of satellite viewpoints, and remote sensing imagery violates the simple reflection assumptions of natural scenes. Surface reflectance depends on material properties and illumination geometry, requiring explicit Bidirectional Reflectance Distribution Function (BRDF) modeling. In this work, a physically based NeRF formulation is proposed using the Hapke radiative transfer model, which efficiently describes surface–radiance interactions with a small set of parameters. This physically grounded approach is compared experimentally with empirical BRDF models, demonstrating its potential to enhance the physical realism and interpretability of NeRF reconstructions for Earth observation applications.

Mobile multi-camera system performance for photogrammetric road surface 3D measurements - assessment the effect of driving speed

Matti Tapio Vaaja¹, Markus Sarlin¹, Eino Waldén¹, Petri Rönnholm¹, Hannu Hyyppä¹, Juha Hyyppä², Mikko Vastaranta³, Matti Kurkela¹

¹Department of Built Environment, Aalto University, Finland; ²Department of Photogrammetry and Remote Sensing, Finnish Geospatial Research Institute, National Land Survey of Finland, FI-02150 Espoo, Finland; ³School of Forest Sciences, University of Eastern Finland, Joensuu, 80101, Finland

In this study, we built a mobile multi-camera system and investigated its use for photogrammetric 3D measurement of road surface geometry. More specifically, we tested the effect of driving speed on the quality of the 3D point cloud geometry on road surface. Our conclusion was that, with a five-camera system at speeds of 3-20 km/h, we achieved 3D distance errors of less than 0.5 mm when the data was compared to reference data measured from road surface samples. The results show that the method has great potential for producing sub-millimetre resolution and precision data on road surface damages, road roughness, and other road parameters. The

purpose is to use the system to collect reference data for verifying data from operational mobile laser scanning systems. The system can also be installed on other platforms and applications.

Digital Analysis of Rock Art in Santa Olaya Canyon: Integrating Cultural Landscape and UAV Technologies for Conservation

Fabiola D. Yépez-Rincón^1,3, Glenda N. Requena Lara², Carlos C. Aguilar Treviño³, Jacinto Treviño-Carreón², Ma. Eugenia Calvillo-Villacaña⁴, Pablo A. Cerda-Luque⁴, Juan F. Morales-Pacheco⁵

¹Faculty of Civil Engineering, Universidad Autónoma de Nuevo León, San Nicolás de los Garza, Nuevo León, México; ²Faculty of Engineering and Sciences, Universidad Autónoma de Tamaulipas, Ciudad Victoria, Tamaulipas, México; ³Teebcon Servicios, Ingenierías y Proyectos, SA de CV, Monterrey, Nuevo León, México; ⁴Faculty of Architecture, Design and Urbanism, Universidad Autónoma de Tamaulipas, Tampico, Tamaulipas, México; 5Faculty of Law and Social Sciences Victoria, Universidad Autónoma de Tamaulipas, Ciudad Victoria, Tamaulipas, México;; ⁵Faculty of Law and Social Sciences Victoria, Universidad Autónoma de Tamaulipas, Ciudad Victoria, Tamaulipas, México.

This research work presents the digital documentation of rock art found on a rock face in the Santa Olaya Canyon, in the municipality of Burgos, Tamaulipas. Unlike rock art found in caves, these open-air expressions are actively integrated with the natural and cultural landscape, functioning as symbolic markers of the territory. Through controlled flights with a DJI Mavic Air 2 drone and 3D photoreconstruction techniques, a difficult-to-access vertical surface of a rock face with rock paintings was recorded with high precision. The methodology employed responds to the need for conservation and study of these sites, which lack institutional protection mechanisms from the INAH (National Institute of Archaeology and History) or, as in this case, conservation and cultural research studies. It also contextualizes the value of rock art in Tamaulipas, particularly in the San Carlos and San Nicolás mountain ranges, where some of the most significant collections in northeastern Mexico are found. The application of non-invasive digital technologies is positioned as an effective tool for the documentation, analysis, and dissemination of archaeological heritage, especially in remote and limited-access regions. The generated orthomosaic and point clouds provide the opportunity to create a digital legacy of the area.

LiDAR Point Cloud Classification by 3D Sparse CNN for large-scale Mobile Laser Scanning

Nan Li¹, Florian Pöppl², Andreas Ullrich², Harald Teufelsbauer²

¹RIEGL Research & Defense GmbH; ²RIEGL Laser Measurement Systems GmbH

This work presents a deep learning-based framework for semantic classification of Mobile Laser Scanning (MLS) point clouds using a 3D Sparse Convolutional Neural Network (SparseCNN). The proposed approach addresses challenges specific to MLS data, such as varying point density, high data volume, and diverse urban or highway environments. A two-stage, coarse-to-fine classification pipeline is designed to ensure both scalability and high resolution: the first stage performs scene-wide semantic labeling, while the second refines ground-surface features such as road markings, sidewalks, and curbstones at finer spatial resolution.

To enhance robustness, the model is trained with tailored data augmentations including geometric transformations, density dropout, artificial noise injection, and local patch swapping. In addition to geometric input, radiometric features such as reflectance and echo information are incorporated to improve object differentiation, especially for materials like traffic signs and painted road surfaces.

Two sets of models are trained for different acquisition wavelengths (905 nm and 1550 nm), to account for the impact of laser wavelength on reflectance responses. Classification results on urban and highway scenes demonstrate the effectiveness of the method across a variety of environments and sensor platforms.

MUSF-SSA: Multi-scale Umbrella Feature with Spatial Self-Attention Model for Semantic Segmentation of Point Clouds

Linfu Xie, Rutao Zhang, Tianyi Xu, Weixi Wang, Xiaoming Li, Shengjun Tang, Renzhong Guo

Shenzhen University, People's Republic of China

Semantic segmentation of point clouds, a fundamental task in 3D scene understanding, faces two persistent challenges. First, it is difficult to efficiently extract discriminative features for complex and irregular surfaces; existing methods struggle with the trade-off between simple features, which are insufficient, and complex features, which are computationally expensive. Second, many deep learning models ignore the inherent spatial correlation of point cloud features during the training process, limiting segmentation accuracy. Optimizing the feature representation for complex surfaces while fully leveraging feature correlation is key to advancing segmentation performance.

To tackle these challenges, we propose the Multi-Scale Umbrella Feature model with Spatial Self-Attention (MUSF-SSA). This model introduces a novel Multi-Scale Umbrella Feature (MUSF) to efficiently represent irregular surfaces and integrates a spatial self-attention (SSA) mechanism in its backbone to explicitly learn the spatial correlation between features.

Through these improvements, while maintaining a low parameter count (1.088M), our model achieves 68.6% mIoU, 76.5% mAcc, and 90.4% OA on the S3DIS Area-5 test, a typical indoor point cloud dataset. Compared to the similar method RepSurf-U, this represents a gain of +3.6% mIoU, +4.0% mAcc, and +2.6% OA.

Evaluating the Efficiency of Machine Learning Algorithms in Identifying Geothermal Energy Potential Areas in Akita and Iwate Provinces, Japan

Majid Kiavarz, Mohammadreza Jelokhani Niaraki, Avin Meysami, Yasaman Ghorbani, Najmeh Neysani Samany

University of Tehran

The growing demand for clean and renewable energy sources has intensified the need to identify and exploit geothermal resources as a key solution for sustainable energy development. However, geothermal exploration faces significant challenges including geological complexity, high drilling costs, economic risks, and spatial data limitations. This study evaluates the efficiency of advanced machine learning algorithms, specifically Random Forest and Generative Adversarial Networks (GANs), in identifying geothermal energy potential areas in Akita and Iwate provinces, Japan. Using a limited dataset of 152 geothermal well locations, seven key parameters were analysed: volcanic activity, fault and fracture density, hot springs, surface thermal indices, fumaroles, mud volcanoes, and surface alteration evidence. Data were collected from geological and remote sensing sources and pre-processed for modelling. Results demonstrate that both algorithms effectively identify high-potential areas despite data scarcity. Random Forest achieved 94.08% accuracy in well identification with a C/S(C) index of 10.93, demonstrating robust performance and spatial correlation. The Generative Adversarial Network showed superior performance with 96.71% accuracy and a C/S(C) index of 4.36, indicating exceptional capability in identifying geothermal potential areas and detecting complex spatial patterns. These findings confirm that hybrid approaches combining machine learning and deep learning, particularly GANs, possess high capability for accurate geothermal prospectivity mapping and can effectively overcome limitations posed by data scarcity, providing valuable tools for exploration prioritization and investment decision-making

Theoretical Comparison of Façade Texture Resolution for 3D Building Models Generated from Nadir and Oblique Aerial Imagery

Masato Ishikawa, Tomoaki Inazawa, Yoshihiko Nakanishi, Futa Kawamata, Masahito Takada, Takuya Danjo

Kokusai Kogyo Co., Ltd., Japan

Building models are one of the key features in 3D city models. To realistically represent building exteriors, texture images are often applied to these models. Such textures are important not only for visual appearance but also for practical applications, such as automated generation of higher-Level-of-Detail (LoD) models and various urban simulations. In large-scale urban modeling projects, façade textures are typically obtained through aerial photogrammetry conducted by manned aircraft, primarily due to operational efficiency. In many such surveys, image acquisition is mainly based on nadir-oriented cameras. However, nadir-only imaging inherently limits façade resolution due to viewing geometry. In this study, we compare the façade resolution attainable from nadir and oblique cameras to examine the effectiveness of multi-directional camera systems in producing high-resolution façade textures. A theoretical approach is adopted to estimate the attainable façade resolution under given imaging conditions. A comparative analysis using the camera parameters of UCE M3 (nadir-only) and CM-2 (multi-directional) indicates several advantages of oblique cameras for façade texture generation: (1) significant improvement in the lowest façade resolution compared to nadir photography, (2) more consistent façade resolution across the entire survey area, and (3) limited sensitivity of façade resolution to increased camera station interval. These findings suggest that incorporating oblique cameras into an aerial survey system can contribute to stabilizing and enhancing attainable façade resolution compared to nadir only configurations.

Calibrating large-FOV stereo videogrammetric system using drone and epipolar geometry

Haibo Shi, Xianglei Liu, Runjie Wang

Beijing University of Civil Engineering and Architecture, China

Videogrammetry is widely used in fields such as structural health monitoring, surveillance, and aerospace, where accurate 3D measurements rely on precise calibration of stereo camera systems. Traditional planar target–based calibration provides high accuracy but becomes impractical for large-FOV setups due to the need for large, high-precision targets placed at long working distances. Control-field calibration, which uses spatially distributed artificial targets measured by total stations or GPS-RTK, similarly faces limitations in environments lacking accessible mounting locations. Other existing methods—such as rigid stereo-target calibration, close-range light-spot targets, and active phase targets—offer partial improvements but remain constrained by fabrication complexity, optimization instability, or limited depth-direction accuracy.

To address these challenges, this work proposes a flexible calibration method for large-FOV stereo videogrammetric systems using UAV trajectory imaging and epipolar geometry. A UAV carrying a rigid circular target flies through the measurement volume, while two synchronized cameras record its motion. Target centers are extracted using Circular-MarkNet, intrinsic parameters are obtained using an active-phase target, and scale-free extrinsic parameters are initialized from essential matrix estimation. The metric scale is introduced through static GPS measurements, and all parameters are refined via nonlinear optimization. Validation against a conventional circular-target control field shows that the proposed approach achieves comparable calibration accuracy within a 70–50–10 m volume while avoiding the need for large calibration targets.

A Hybrid Approach using Gaussian Splatting and Parametric Models based on 3D Renders for Real-Time Visualisation

Etienne Sommer, Arnadi Murtiyoso, Mathieu Koehl, Pierre Grussenmeyer

INSA Strasbourg, France

The valorisation and dissemination of built heritage to the public is a crucial objective, complementing conservation efforts. However, traditional 3D models, such as dense meshes, often present limitations for this purpose, proving too heavy and complex for easy sharing and real-time visualisation.

This paper presents a hybrid approach that addresses this challenge by leveraging 3D Gaussian Splatting (3DGS) for the real-time visualisation of complex parametric models. This method is particularly effective for visualising 4D reconstructions representing historical phases of edifices that may no longer exist.

The methodology employs synthetic images generated from the parametric model using 3D rendering software. To ensure compatibility with procedural textures, path-tracing is used , but photorealistic effects such as cast shadows and reflections are deliberately removed. These optimised 3D renders are then processed through a conventional photogrammetric pipeline to generate the necessary camera orientations and sparse point cloud for 3DGS training.

The resulting 3DGS representation enables real-time rendering. This technique successfully converts a model composed of multiple, distinct parametric components into a single, unified object. This approach also demonstrates a strong capability for reconstructing contextual elements, such as vegetation, which are often poorly handled by traditional meshing techniques. The method effectively transforms a complex, software-specific model into a lightweight representation ideal for applications where visualisation speed is essential.

Improving Head Pose Estimation in Radiation Therapy through photogrammetric Techniques for Machine Learning Applications

Cyrill Milkau¹, Sebastian Preußel², Sarah Guy³, Danilo Schneider¹

¹Faculty of Spatial Information, HTW Dresden – University of Applied Sciences, Germany; ²Institute of Photogrammetry and Remote Sensing, Dresden University of Technology, Germany; ³Department of Radiotherapy and Radiation Oncology, Dresden University of Technology, Germany

This study investigates the integration of photogrammetry and machine learning to enhance head pose estimation in radiation therapy. The primary objective is to improve the accuracy of patient positioning, which could reduce the reliance on immobilization masks, thereby enhancing patient comfort. The methodology involves the use of markers and cameras to track head movements, combined with machine learning algorithms to refine pose estimation. By merging deterministic photogrammetric techniques with advanced machine learning models, this approach aims to achieve more precise and reliable head pose estimation. The potential outcomes of this research could lead to more effective and comfortable radiation therapy treatments for patients with head-and-neck cancers.

A Comparative Study of Deep Learning and Unsupervised Segmentation Methods for Individual Tree Delineation from LiDAR point clouds

Jinhong Wang¹, Wei Yao^2,3, Tiangang Yin¹

¹Department of Land Surveying and Geo-Informatics, The Hong Kong Polytechnic University; ²Institute of Urban Environment, Chinese Academy of Sciences, China, People's Republic of; ³School of Engineering and Design, Technical University of Munich, Munich, 80333, Germany

This study aims to conduct a comparative analysis of individual

tree segmentation (ITS) methods for forest LiDAR point clouds.

Traditional ITS approaches have been predominantly based on

unsupervised segmentation algorithms using geometric features.

In recent years, research has progressively shifted toward super-

vised deep learning (DL) techniques. However, the perform-

ance of existing methods across diverse forest types has not yet

been systematically assessed.

On solving exterior orientation of an image with particle swarm optimization

Petri Rönnholm, Matti Kurkela, Matti T. Vaaja, Hannu Hyyppä

Department of Built Environment, Aalto University, Finland

Solving the exterior orientation of images is a fundamental component in photogrammetric mapping and 3D restitution processes. Additionally, it is essential in photogrammetric tasks such as visual odometry, camera-based visual simultaneous localization and mapping, camera calibration, camera-based 3D tracking of movement, and change detection. The aim of this research was to evaluate whether particle swarm optimization is suitable for finding the exterior orientation parameters of a single image using image resection. In addition, we developed a robustified particle swarm optimization by adding an iteratively changing stochastic model to the optimization criteria by attaching a weight matrix with residual vectors. The method was compared to the solution from the least squares method using both simulated ideal and noisy data. Solving the exterior orientation parameters reliably with particle swarm optimization was possible after fine-tuning the algorithm's options. The non-robustified version of particle swarm optimization provided identical results to the non-robustified least squares method. However, in the case of the robustified particle swarm optimization, only 60% of attempts resulted in the same outcome as the corresponding robustified least squares method, with sub-millimeter accuracy. In 40% of cases, the results achieved millimeter accuracy. The sub-millimeter accuracy was achieved in every case with sequential robustified particle swarm optimization, where the algorithm was rerun using stricter bounds for unknown parameters if the evaluation criteria were too large. The implementation of particle swarm optimization is easier than that of the nonlinear least squares method. However, the computation time for particle swarm optimization was significantly longer.

Incremental Semantics-Aided Meshing from LiDAR-Inertial Odometry and RGB Direct Label Transfer

Muhammad Affan, Ville Lehtola, George Vosselman

University of Twente

Geometric high-fidelity mesh reconstruction from LiDAR-inertial scans remains challenging in large, complex indoor environments– such as cultural buildings– where point cloud sparsity, geometric drift, and fixed fusion parameters produce holes, over smoothing, and spurious surfaces at structural boundaries. We propose a modular, incremental RGB + LiDAR pipeline that generates incremental semantics-aided high-quality meshes from indoor scans through scan frame-based direct label transfer. A vision foundation model labels each incoming RGB frame; labels are incrementally projected and fused onto a LiDAR-inertial odometry map; and an incremental semantics-aware Truncated Signed Distance Function (TSDF) fusion step produces the final mesh via marching cubes. This frame-level fusion strategy preserves the geometric fidelity of LiDAR while leveraging rich visual semantics to resolve geometric ambiguities at reconstruction boundaries caused by LiDAR point-cloud sparsity and geometric drift. We demonstrate that semantic guidance improves geometric reconstruction quality; quantitative evaluation is therefore performed using geometric metrics on the Oxford Spires dataset, while results from the NTU VIRAL dataset are analyzed qualitatively. The proposed method outperforms state-of-the-art geometric baselines ImMesh and Voxblox, demonstrating the benefit of semantics-aided fusion for geometric mesh quality. The resulting semantically labelled meshes are of value when reconstructing Universal Scene Description (USD) assets, offering a path from indoor LiDAR scanning to XR and digital modeling.

Evaluation of systematic and random errors in occupancy grid maps

Yuguang Liu¹, Marko Radanovic¹, Krista A. Ehinger², Kourosh Khoshelham¹

¹Department of Infrastructure Engineering, The University of Melbourne, Australia; ²School of Computing and Information Systems, The University of Melbourne, Australia

Map evaluation for occupancy grid mapping (OGM) is critical in the field of high-definition mapping of the road environment for autonomous vehicles. Existing methods cannot adequately evaluate the systematic and random errors that might be present in OGM. This article introduces two evaluation metrics for OGM under LiDAR position uncertainty: Mean Signed Distance (MSD) and Mean Absolute Deviation (MAD). MSD quantifies systematic displacement of occupied cells, while MAD measures random error exhibited as boundary thickening. Unlike classification-based, probabilistic, and geometric metrics, MSD and MAD directly isolate displacement and thickening effects in OGM. We validate both metrics in a controlled synthetic environment and on a real indoor LiDAR dataset, showing better performance than conventional metrics.

Deep learning-based building detection using high-resolution RGBI orthophotos and DSMs

Mohamed Fawzy^1,2, Attila Juhasz¹, Arpad Barsi¹

¹Department of Photogrammetry and Geoinformatics, Faculty of Civil Engineering, Budapest University of Technology and Economics, Műegyetem rkp. 3, H-1111 Budapest, Hungary, {mohamed.fawzy, juhasz.attila, barsi.arpad}@emk.bme.hu; ²Civil Engineering Department, Faculty of Engineering, Qena University, 83523 Qena, Egypt, mohamedfawzy@eng.svu.edu.eg

Deep learning techniques have demonstrated a promising efficacy for building feature extraction, presenting practical strategies to lessen the labour-intensive work of map updating, change detection, and urban growth monitoring. To address the labour-consuming challenges, a U-Net-based convolutional neural network model is developed to generate building maps automatically using high-resolution RGBI orthophoto and DSM data. The approach shows the effectiveness of the U-Net-based semantic segmentation for urban scene analysis. The presented procedures collect, preprocess, and combine orthophoto with DSM in order to train, apply, and assess the U-Net model for building extraction in urban environments using two input scenarios: (1) solely RGBI orthophoto and (2) RGBI orthophoto integrated with DSM. Four standard metrics: completeness, correctness, quality, and overall accuracy are applied to evaluate the model outputs, comparing the single orthophoto input to the combined orthophoto with DSM for building detection. The significant impact of the DSM and RGBI pairing is demonstrated by the heightened reliability of the data integration strategy when estimating buildings within nearby similar objects like roads and impervious surfaces. However, a few challenges related to the model's generalisation are noticed across complex urban contexts, including tree occlusions, unreferenced building extensions, and height irregularities surrounding structures. The findings highlight the potential of multimodal data fusion in urban investigations and reveal how it can improve the mapping of built-up assets. Final results argue that DSM incorporation significantly enhances building classification performance using deep learning frameworks for geospatial applications, particularly in complex urban environments where single data and traditional image-based segmentation methods face limitations.

Simulation of Stationary and Mobile Laser Scanning with VRscan3D

Denys Gorkovchuk¹, Julia Horkovchuk¹, Maria Chizhova², Darius Popovas³, Thomas Luhmann³

¹Kyiv National University of Construction and Architecture; ²Otto-Friedrich Universität Bamberg; ³Institute for Applied Photogrammetry and Geoinformatics

The VRscan3D project introduces a virtual simulation environment for stationary and mobile laser scanning designed to enhance education, research, and AI-based point cloud analysis. Developed using Unreal Engine, the simulator replicates the physical behavior of real terrestrial laser scanners, allowing users to perform realistic scanning operations within immersive 3D environments. The system reproduces manufacturer-specific parameters such as range noise, beam divergence, and intensity, generating synthetic point clouds that closely approximate real data.

VRscan3D enables users to plan and execute virtual scanning campaigns, analyze data quality, and understand the influence of scanning geometry, surface materials, and user behavior. Recent developments include dynamic scene simulation with moving objects, integration of user-imported environments, and support for mobile scanning trajectories—handheld, vehicle-mounted, or UAV-based—reflecting natural oscillations and movement patterns.

In addition to training and education, VRscan3D serves as a generator of synthetic point clouds with known ground truth, facilitating the development and validation of AI algorithms for object detection, segmentation, and classification. Comparative studies between simulated and real scans demonstrate high similarity in terms of accuracy, resolution, and completeness.

By bridging real-world surveying practice and virtual learning, VRscan3D offers a cost-effective, accessible platform for universities and professionals lacking physical equipment or facing mobility restrictions. It represents a new step toward open, immersive, and intelligent learning environments in geospatial education and research.

Symmetry-aware Texture Refinement for 3D Building Models via Massing Decomposition and Generative AI

Fan Xue¹, Yijie Wu², Maosu Li³

¹The University of Hong Kong, Hong Kong S.A.R. (China); ²The Hong Kong Polytechnic University, Hong Kong S.A.R. (China); ³The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China

Three-dimensional (3D) building models with accurate geometry and realistic textures remain essential for city information modeling and digital twin applications. However, photogrammetric reconstructions consistently suffer from severe texture defects caused by occlusions, shadows, distortions, and projection errors. Existing approaches either rely on rigorous photometric optimization that demands topological correctness and multi-view imagery, or employ flexible AI-driven generation that leverages semantics but often lacks geometric constraints.

This paper presents a novel hybrid framework that exploits architectural regularities—specifically massing decomposition and partial symmetries—to guide high-fidelity texture refinement. We first decompose building meshes into mass-aligned convex volumes using MorphCut. Textures are then reprojected onto these volumes, followed by Building Section Skeletons to pair symmetric facades and establish precise geometric correspondences. Finally, generative AI is applied using symmetry-aware constraints to achieve contextually accurate inpainting and correction.

Pilot studies on three Hong Kong buildings demonstrate robust decomposition, faithful texture transfer, and effective defect mitigation, while revealing current limitations of unconstrained generative models in preserving floor counts and structural regularity. The proposed symmetry-guided pipeline notably advances the reliable and semantically coherent reconstruction of textures for complex urban buildings.

AI-Driven 3D reconstruction and quality assessment for Cultural Heritage: first results from the HERITALISE project

Filiberto Chiabrando¹, Andrea Maria Lingua², Alessio Martino¹, Francesca Matrone², Alessandra Spadaro²

¹Laboratory of Geomatics for Cultural Heritage (LabG4CH), Department of Architecture and Design (DAD), Politecnico di Torino, Viale Pier Andrea Mattioli, 39, Torino (TO), Italy; ²Geomatics Lab, Department of Environment, Land and Infrastructure Engineering (DIATI), Politecnico di Torino, Corso Duca degli Abruzzi, 24, Torino (TO), Italy

The accurate digital documentation of Cultural Heritage (CH) assets demands workflows capable of integrating heterogeneous, multiscale datasets while preserving both geometric fidelity and radiometric completeness. This paper presents the first results of the AI-based processing pipeline developed within the HERITALISE project (Horizon Europe, 2025–2028), applied to three multiscale case studies at the Reggia di Venaria Reale (Turin, Italy): an outdoor-indoor UAV photogrammetric survey, a kinematic SLAM acquisition of a contemporary sculpture garden, and a close-range dataset of an 18th-century decorative artefact. 3D Gaussian Splatting (3DGS) is evaluated as a novel view synthesis method across all three scenarios, demonstrating strong photorealistic rendering capabilities, particularly for complex material properties and geometrically challenging interiors, whilst highlighting current limitations for metric surveying applications. A two-stage crack detection workflow, combining tile-based text-prompted segmentation with SAM3 and multiview ray-based reprojection onto the reconstructed mesh, is validated on UAV imagery, achieving an 84.9% ray–mesh intersection rate. Finally, a standardised evaluation framework is proposed, encompassing adaptive, scale-dependent geometric and radiometric metrics organised into reference-based and no-reference assessment scenarios, aggregated into a transparent synthetic quality score with three adaptive quality classes. The proposed methodology contributes toward a reproducible, sensor-agnostic standard for the assessment of AI-generated CH documentation products.

Haul Road Extraction in Open-Pit Mines via Dual-Encoder RGB–DSM Transformer Fusion

Loghman Moradi, Kamran Esmaeili

University of Toronto, Canada

Haul roads are essential to open-pit mines, acting like the mine’s circulatory system. Keeping accurate, up-to-date maps of these roads is critical for maintenance, safety, and efficient material handling, yet automating this task is challenging. Traditional deep learning models that rely only on RGB images often fail in mining environments, where road surfaces resemble bare earth, dusty terrain, or shadowed areas. To address this, we propose a dual-encoder transformer that combines UAV-captured RGB images with DSM data using stage-wise cross-attention, leveraging both visual and topographic information. Two SegFormer encoders process each data type separately, creating detailed feature representations that are fused at each stage. This allows the model to learn specialized information while sharing knowledge between modalities. A lightweight All-MLP decoder produces the final segmentation map. We tested our method on a high-resolution dataset of 12,000 tiles from the Mildred Lake open-pit mine in Fort McMurray, Canada. Our model achieves 80.8% mIoU, 88.7% F1-score, and 73.7% road accuracy, outperforming an RGB-only baseline by 3.3%, 2.4%, and 7.8 points, respectively. Ablation studies demonstrate that including DSM data consistently improves recall and road detection, especially in areas where RGB information alone is ambiguous or terrain is complex.

Benchmarking Local Registration Algorithms on Multi Temporal and Multi Spatial Point Clouds

Tommaso Mainiero, Jad Ghantous, Nives Grasso, Vincenzo Di Pietra

Department of Environment, Land and Infrastructure Engineering , Politecnico di Torino, Italy

This study presents a systematic benchmarking framework to evaluate the performance of local point cloud registration algorithms and their impact on geomorphological change detection. Three widely used methods—Iterative Closest Point (ICP), Point-to-Plane ICP, and Generalized ICP (GICP)—were tested across two alpine case studies in Italy (Rio Cucco catchment and Belvedere Glacier), considering different surface types and initial alignment conditions.

Three local registration methods—Iterative Closest Point (ICP), Generalized ICP (GICP), and Point-to-Plane ICP—were tested under varying initial alignment and terrain conditions using standardized voxelized patches (0.3 m). Performance was evaluated through median distance, cloud-to-cloud mean distance, and computation time metrics.

Results highlight the strong influence of surface morphology on algorithmic stability: rocky areas ensure reliable convergence, while dense vegetation introduces ambiguity and reduced accuracy. GICP provided the best compromise between robustness and efficiency.

The study further highlights that integrating robust outlier rejection significantly improves statistical consistency and reduces LoD95. The proposed approach provides a reproducible framework for optimizing co-registration strategies and improving the accuracy of geomorphological monitoring in high-relief environments.

Human Trajectory Prediction on UAV Images: A Comparative Study

Rafael D. M. da Hora¹, Daniel R. Santos¹, Maurício C. M. Paulo¹, Felipe Ferrari¹, Raul Q. Feitosa², Paulo F. F. Rosa¹

¹Military Institute of Engineering, Brazil; ²Pontifical Catholic University, Brazil

Video human trajectory prediction is a fundamental research task for many applications in civil and defense. Compared to trajectory prediction based on a single frame, human trajectory prediction in videos, especially in the context of unmanned airborne vehicles (UAVs) platforms, is a challenge due to the time series prediction analyses required. As frames in a video streaming are highly correlated, trajectory detection in UAV images is affected by particular factors such as oblique camera views and the platform motion. This study aims to identify the most robust and accurate deep learning model in the context of UAVs videos by comparing three distinct categories: classical machine learning, established deep learning architectures, and computationally efficient models based on Multi-layer Perceptrons (MLPs). We propose an analysis based on only bounding box center coordinates instead of image scenes. The results show that a simple linear architecture provided the best performance, highlighting the importance of these mechanisms in predicting human motion from trajectory data alone.

Multi-technique approach for 3D documentation of rock walls in narrow gorges

Antonio Tomás Mozas-Calvache, José Luis Pérez-García, José Miguel Gómez-López, Diego Vico-García, Jorge Delgado-García

University of Jaén, Spain

This study presents a robust multi-technique methodology for generating complete, high-accuracy 3D documentation of highly constrained natural heritage sites, addressing the limitations of single-technique geomatic approaches. The research focuses on two challenging gorge environments in Southern Spain: Los Cañones de Río Frío and El Caminito del Rey. Both sites feature extreme vertical walls (up to 300 meters and narrow passages that complicate GNSS-RTK positioning and render individual UAV, TLS, or terrestrial photogrammetry techniques unfeasible due to occlusions and safety/logistical constraints. The proposed workflow centers on data fusion, leveraging LiDAR data for core geometry and photogrammetry for texture and gap-filling. Data acquisition integrated multiple sensors, including UAV LiDAR/Photogrammetry, Terrestrial Laser Scanning (TLS), Mobile Mapping Systems (MMS), and Spherical Photogrammetry (SP). A key methodological innovation involves deriving second-order Ground Control Points (GCPs) from UAV photogrammetry to georeference other data in areas with poor satellite coverage, significantly reducing fieldwork while maintaining accuracy. The highly precise TLS point cloud was used as the geometric base for the final model. The resulting products—including high-density point clouds and 2 cm orthoimages and 3D models—demonstrate comprehensive coverage and high accuracy (about 4 cm for georeferenced data), enabling 2.5D rockfall simulation and establishing a foundation for a Digital Twin of both gorges.

Augmented and Mixed Reality Scene Alignment Through 3D-to-3D Learning-Based Cross-Source Point Cloud Registration

Juan Sebastian Sardi Barzallo¹, Volker Coors²

¹Stuttgart,Technical University of Applied Sciences; ²Stuttgart,Technical University of Applied Sciences

With the fast development of reality capture technology and the increasing availability and accessibility to devices capable of capturing 3D point clouds, a wide range of applications where cross-source Point Cloud Data (PCD) data interact appears to be more frequent. Augmented and Mixed Reality (AR/MR) technologies are pivotal for the integration between digital and physical environments by overlaying Digital Twin (DT) models into real contexts, and show themselves as capable of producing real-time 3D point cloud data. Nevertheless, the integration of AR/MR real-time 3D point cloud data with others such as LiDAR data still an open field for research specially at fundamental tasks such as scene alignment and camera localization. Conventional vision-based methods are vulnerable to environmental variations making achieving suitable camera localization and scene alignment challenging. Conventional vision-based methods are vulnerable to environmental variations, making achieving suitable camera localization and scene alignment challenging. This work proposes an exclusively 3D-o-3D-based methodology for AR/MR scene align alignment and camera localization addressing the challenges of cross- source point cloud registration in large size disparity scenarios. By combining cross-source point cloud registration via Voxel Representation and Hierarchical Correspondence Filtering (VRHCF) learning-based method TEASER++ algorithm, our approach effectively manages asymmetric heterogeneous point cloud data, achieving promising registration results especially in extensive indoor settings. The qualitative results suggest improvements over existing studies, despite outlier challenges in outdoor environments that warrant further research. This study highlights the potential and the essential need for advanced methodologies to enable seamless interactions between digital and physical worlds.

Semantic-Guided High-Fidelity Indoor Scene Reconstruction Based on 3D Gaussian Splatting

Mingyue Dong¹, Xianwei Zheng¹, Jiansi Yang¹, Linwei Yue², Jianya Gong¹

¹Wuhan University; ²China University of Geosciences

Indoor 3D scene reconstruction is essential for digital twins and intelligent spatial applications but remains challenging due to severe occlusions, weak textures, and complex geometric structures. This paper presents a semantic-guided high-fidelity indoor reconstruction framework based on 3D Gaussian Splatting (3DGS), which achieves high-precision geometry and photorealistic rendering through semantic-aware optimization. First, a high-quality geometric prior generation scheme is developed by integrating a 2D depth prediction network to enhance noisy depth data captured by mobile devices. The refined depth maps are processed by computing spatial gradients to derive surface normals in world coordinates, providing geometric supervision for the position and orientation of Gaussian ellipsoids. A projection-error-based filtering mechanism ensures consistency across multiple views. Second, a semantic-guided differentiated reconstruction framework is introduced. Using a pretrained segmentation model (SAM), the method distinguishes between large weak-texture areas and fine-detail regions. Normal regularization improves surface smoothness in planar regions, while detail-aware weighting strengthens local geometric fidelity. Additionally, a multi-view semantic consistency strategy jointly optimizes color and geometry across viewpoints, enhancing global coherence and reducing overfitting. Experiments on ScanNet++ and Mushroom datasets demonstrate that the proposed method surpasses state-of-the-art baselines in rendering quality and geometric accuracy. It effectively reconstructs continuous surfaces and detailed structures, showing strong potential for applications in virtual reality, digital twins, and real-time indoor modeling.

Enhanced DUSt3R for Underwater 3D Reconstruction in Shallow Water Environments

Tsuyoshi Shimano, Takashi Fuse

The University of Tokyo, Japan

Shallow-water environments present significant challenges for underwater photogrammetry due to light caustics and the combined effects of absorption and scattering caused by water turbidty. These optical disturbances degrade image quality, disrupt feature matching, and ultimately reduce the reliability of 3D reconstruction using traditional SfM (Structure from Motion) pipeline. In this study, we focus on these two dominant factors and investigate a 3D reconstruction framework inspired by recent feed-forward architectures such as DUSt3R (Dense and Unconstrained Stereo 3D Reconstruction). To support this approach, we develop a synthetic data generation pipeline capable of simulating shallow-water visual conditions. Preliminary experiments indicate a possible trend for integrating physics-aware image formation with DUSt3R-type feed-forward reconstruction. However, several limitations remain: the current model does not yet achieve stable accuracy, real-world underwater validation has not been conducted, and computation costs remain high due to complex training procedures. Future work will focus on refining the network architecture, exploring DUSt3R-derived multi-view and high-fidelity extensions, accelerating computation, and validating the pipeline in real

shallow-water environments. Additionally, integrating advanced rendering techniques may further improvethe refinement of 3D reconstruction.

Evaluating SfM Techniques for DEM Production from VHR Satellite Imagery in Urban Contexts

Gabriele Lo Grasso, Valentina A. Girelli, Emanuele Mandanici

Alma Mater Studiorum - University of Bologna, Italy

Digital Surface Models (DSMs) provide the fundamental elevation data required for generating 3D city models, which support a wide range of analyses such as solar potential estimation, urban heat island assessment, and infrastructure monitoring.

Advances in very high-resolution satellite stereo imaging, airborne LiDAR, and aerial photogrammetry have made it possible to generate DSMs at fine spatial resolution using different acquisition geometries and multi-view reconstruction techniques. However, these data sources differ substantially in terms of spatial resolution, viewing geometry, and surface visibility, leading to variations in elevation accuracy and morphological completeness. Airborne LiDAR surveys can provide highly detailed and accurate three-dimensional point clouds compared to aerial photogrammetry, but are associated with high acquisition and processing costs, as well as logistical constraints.

This study presents a comparative analysis of the DSMs derived from WV-3 panchromatic stereo imagery and oblique aerial photographs processed with the Structure-from-Motion (SfM) approach, focusing on the capability of SfM to reconstruct the complex urban morphology. The study area, a district of the city of Bologna, is characterized by a heterogeneous urban texture including compact mid-rise residential blocks, industrial facilities, vegetated zones, and open spaces, making it an ideal test site for comparing elevation models derived from different sensors and acquisition geometries.

Canopy Entropy Sensitivity Analysis for Scalable Canopy Structural Complexity Estimation

Bin Wang, Yuqi Lei, Zheng Xu, Wen Xiao

China University of Geosciences（Wuhan）, China, People's Republic of

Canopy Entropy (CE) quantifies 3-D forest heterogeneity from LiDAR, but its reliability depends on point density and kernel bandwidth. Using 11 sub-sampled airborne datasets (12–240 pts m⁻²) and bandwidths 0.1–2 m over a 20 ha Jiangxi plot, we show CE is stable (CV < 0.6 %) above 72 pts m⁻², whereas below 50 pts m⁻² it falsely inflates (> +5 %). CE grows logarithmically with bandwidth, saturating beyond 1 m; 0.2 m is optimal at landscape scale. Maintain ≥ 50 pts m⁻² and h ≈ 0.2 m for unbiased canopy-complexity mapping. An Investigation of the Application of GCE for Comparing Cross-Scale Structural Complexity Using Simulated Datasets.

High-Precision Point Cloud Registration Method Based on Planar and Linear Features

Chenxin Yang, Kazuha Kumazawa, Saki Komoriya, Hiroshi Masuda

The University of Electro-Communications, Japan

Accurate registration of point clouds obtained from different viewpoints is essential for constructing consistent and reliable 3D models. Terrestrial laser scanner (TLS) data are typically represented in local coordinate systems centered at individual scanner positions, requiring transformation into a common reference frame. However, achieving high-accuracy registration for large-scale datasets remains challenging. Even small rotational errors in rigid transformations can result in significant positional deviations over long distances. Conventional registration methods, such as the Iterative Closest Point (ICP) algorithm, perform well in dense regions but often produce misalignments in sparse or geometrically uniform areas.

This study presents a high-precision point cloud registration approach that integrates global geometric features—such as planes and lines—with local point-based constraints. Plane and line features are extracted using RANSAC-based detection and incorporated into an enhanced ICP framework, improving both stability and convergence in large-scale environments.

Experimental evaluations using real TLS datasets acquired from an industrial factory demonstrate that the proposed hybrid ICP method significantly outperforms conventional approaches. The integration of global geometric features effectively reduces local misalignments and improves registration accuracy, particularly in regions with uneven point density or limited structural variation.

RTK-Guided Gaussian Splatting Pipeline for Georeferenced Urban 3D Reconstruction

Cheolhwan Kim¹, Wonjun Choi¹, Youngmok Kwon¹, Jungho Lee², Minhyeok Lee², Hong-Gyoo Sohn¹

¹Dept. of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea; ²Dept. of Electrical and Electronic Engineering, Yonsei University, Seoul, Republic of Korea

Automated 3D reconstruction technologies utilizing multi-source spatial data have gained significant attention in recent years. While conventional approaches rely on registration-based multi-sensor integration, recent Gaussian Splatting techniques have shown strong potential for large-scale modeling using only monocular imagery. However, existing 3DGS frameworks operate in relative coordinate systems and lack alignment with absolute geospatial references, limiting their applicability for real-world mapping.

To address these challenges, we propose a georeferenced Gaussian Splatting framework that integrates RTK-GPS camera position measurements directly into the training process. Initial camera parameters and sparse point clouds are estimated using an image-based SfM pipeline and subsequently aligned to a global coordinate frame through a similarity transformation based on RTK-GPS measurements acquired alongside the imagery. During coarse GS training, per-camera translation and rotation corrections are jointly optimized to compensate for geometric errors introduced during global frame alignment. The translation updates are guided toward RTK-GPS-measured positions, while a reprojection constraint based on SfM sparse 3D observations preserves the multi-view geometric consistency established by bundle adjustment.

The proposed method generates 3DGS outputs aligned with an absolute coordinate system with only marginal degradation in rendering metrics such as PSNR, SSIM, and LPIPS. Mesh conversion and surface-distance comparison with laser scanning data further validate the reliability of the reconstructed geometry. This work demonstrates the feasibility of real-world georeferenced modeling using Gaussian Splatting-based scene representation.

Shape Reconstruction from Large Scale Point Clouds Using Planar Adjacency Relations

Yusuke Nagasawa, Hiroshi Masuda

The university of Electro Communication, Japan

Digital twins of production facilities, represented as 3D virtual environments generated from point cloud data, are increasingly demanded for efficient facility management. Although terrestrial laser scanners (TLS) enable high-density 3D acquisition of such environments, the resulting point clouds are extremely large in data size. In practical applications, lightweight mesh models are therefore required as a substitute for raw point cloud data. However, TLS measurements often contain occlusions and missing regions, making it challenging to reconstruct complete mesh models directly from incomplete point clouds. Many objects installed in production facilities, such as equipment platforms, fences, columns, and ladders, consist mainly of planar surfaces. Efficient plane detection methods have been developed for large-scale point clouds (Masuda, 2015; Takeda, 2024). For objects composed of planes, 3D models can be reconstructed from the detected planes. However, industrial point clouds are extremely large, including many densely sampled planar regions. Furthermore, many existing methods focus on standard components with fixed shapes, such as pipe structures, and are not applicable to objects with more flexible geometries. To overcome these limitations, this study first converts the detected planar regions into simplified mesh representations to reduce data volume. We then construct a planar adjacency graph that preserves spatial relationships and geometric attributes between planes. Finally, we reconstruct the target structure by identifying and assembling appropriate subsets of planes.

In-situ LiDAR-assisted backpack camera system calibration for forest mapping

Raja Manish, Songlin Fei, Ayman Habib

Purdue University, United States of America

Backpack mapping systems equipped with LiDAR sensors and RGB cameras, and an optional GNSS/INS direct georeferencing unit, are increasingly used in forest inventory applications. A key prerequisite to deriving accurate mapping products from these platforms is system calibration to establish the mounting parameters relating the LiDAR and camera sensors to the IMU body frame of the GNSS/INS unit. Conventional system calibration procedures entail specific trajectory and target deployment at the calibration site, followed by a labor-intensive identification of targets in imagery and LiDAR point cloud. Given the significance of multi-modal data alignment for forest inventory, this study explores an alternative approach for camera–LiDAR system calibration.

Bundle Adjustment for Satellite Attitude Jitter

Shun Zhou, Hongbo Pan

Central South University, China, People's Republic of

To address the limitations of existing RFM bias-compensation methods, which difficult to handle complex attitude jitter and lack fully automated processing, this study introduces an innovative Bundle Adjustment (BA) approach that incorporates adaptively determined spline smoothing parameters. The method constrains the smoothing term of the spline using prior matching accuracy and enables the adaptive estimation of the smoothing parameter within the BA process. Because the procedure requires no manual intervention and the adaptive smoothing term retains reasonable physical interpretation, the proposed approach is broadly applicable to the correction of attitude jitter in linear pushbroom satellite systems.

A Comparative Study of MVS and NeRF Approaches for Dense 3D Reconstruction of Mediterranean Coral

Paolo Rossi¹, Riccardo Roncella¹, Cristina Castagnetti²

¹University of Parma, Department of Engineering and Architecture, 43124, Parma, Italy; ²University of Modena and Reggio Emilia, Department of Engineering, 41125, Modena, Italy

This work investigates the potential of optimizing underwater image acquisition while preserving reconstruction quality. A comparative evaluation of Multi-View Stereo (MVS) and Neural Radiance Fields (NeRF) is conducted, focusing on their performance in terms of completeness and robustness under conditions of reduced image availability. The study concentrates on underwater scenes involving Mediterranean coral species, where traditional photogrammetric methods often encounter difficulties due to occlusions and low-texture surfaces. The analysis is based on datasets acquired under controlled conditions, allowing for a direct comparison of the dense reconstruction capabilities of both approaches. The impact of decreasing the number of input images on reconstruction completeness and model accuracy is assessed, with results benchmarked against a reference dataset obtained using a triangulation laser scanner.

A progressive framework for 3D scene understanding from multi-view satellite imagery

Xuejun Huang¹, Yi Wan^1,2, Xinyi Liu^1,2, Yongxiang Yao¹, Dong Wei¹, Yongjun Zhang^1,2

¹School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, 430079, Hubei, China; ²Technology Innovation Center for Collaborative Applications of Natural Resources Data in GBA, Ministry of Natural Resources, Guangzhou, 510075, Guangdong, China

3D scene understanding is critical for applications like smart city management and urban planning. However, existing methods often treat 2D semantic understanding and 3D reconstruction as independent tasks, limiting the ability to create a unified 3D semantic representation. This separation hinders the accuracy, interpretability, and scalability of large-scale 3D scene understanding.

In this work, we propose a progressive, three-stage pipeline that seamlessly connects multi-view semantic understanding, self-supervised 3D reconstruction, and end-to-end semantic-level scene understanding. The approach gradually integrates semantic and geometric cues—first establishing reliable semantic priors, then recovering scene geometry without height supervision, and ultimately combining both into a unified 3D representation for more accurate scene understanding.

Beyond geometry: Reflectance-calibrated 3d Gaussians using LiDAR and imagery for photometrically robust Reconstruction

Yaoyu Li¹, Dedong Zhang^2,3

¹Hinton STAI Institute, East China Normal University, Minhang, Shanghai 200241, China; ²Department of Geography and Environmental Management, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1, Canada; ³TianfuJiangxi Laboratory, Chengdu, Sichuan, 641419, China

This paper introduces LIG-3DGS, a novel framework for robust 3D reconstruction and novel view synthesis under conditions where standard image-based methods struggle. The core of our approach lies in the deep integration of LiDAR geometry and intensity information with a 3D Gaussian Splatting (3DGS) representation. Our qualitative and quantitative experiments demonstrate that LIG-3DGS significantly outperforms standard 3DGS and geometry-only baseline methods under challenging photometric conditions. By bridging the geometric precision of active sensing with the high-fidelity rendering of neural approaches, this work opens a promising pathway toward all-weather, high-fidelity 3D scene understanding.

Non-destructive extraction of vertical leaf base and inclination angles distribution in field maize

Lei Lei^1,2, Zhenhong Li^1,2, Guijun Yang^1,2,3, Hao Yang³

¹Key Laboratory of Loess, Xi’an 710054, China; ²College of Geological Engineering and Geomatics, Chang'an University, Xi’an 710054, China; ³Information Technology Research Centre, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China

Distributions of leaf base and inclination angles are important crop phenotypic traits, influencing light interception and productivity. LiDAR provides unprecedented detail of the 3D structure of the crop canopy. Recent research mainly focuses on the leaf base and inclination angles of maize at the individual level or at lower planting density. It is difficult to extract the distributions of leaf base and inclination angles of maize in the field due to the interlocked and overlapped nature of leaves. In this study, we have proposed a high-throughput method to extract the distributions of leaf base and inclination angles of maize in the field. Following the separation of the leaf and stem of maize, hollow cylinders with different thicknesses were used to extract the local leaf points from the separated leaf points based on each stem fitted line, and the DBSCAN algorithm and singular value decomposition were used to calculate the leaf base and inclination angles. The distributions of leaf base and inclination angles of maize in the field with different cultivars, planting densities, and growth stages were extracted and analyzed, and these performed well against the validation data. The high-throughput extraction of these distributions in maize fields holds significant importance for studying the optimal maize cultivar in conjunction with radiative transfer models.

Extraction of CCTV Surveillance Coverage Based on UAV Mesh and CCTV Image

Wonjun Choi¹, Youngmok Kwon², Cheolhwan Kim³, Hong-Gyoo Sohn⁴

¹Dept. of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea; ²Dept. of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea; ³Dept. of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea; ⁴Dept. of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea

This study presents a geometric framework for recovering missing CCTV camera parameters and deriving reliable three-dimensional viewshed coverage by matching UAV-based 3D mesh models with real CCTV imagery. Most CCTV metadata only contains approximate latitude and longitude, while essential calibration parameters such as azimuth, tilt angle, focal length, and field of view are unavailable. Without these parameters, visibility analysis in urban environments becomes inaccurate due to unaccounted building occlusions. To address this, a coarse-to-fine pipeline is proposed. In the coarse stage, camera tilt is estimated from the CCTV image using a monocular surface normal estimation model, and camera yaw is determined by matching cylindrical panoramic renderings of the mesh against the CCTV image using a dense feature matching network. In the fine stage, perspective projection images are rendered at 1 m height intervals using the estimated orientation, and each candidate is matched against the CCTV image to identify the optimal camera height. The rendering process simultaneously records world coordinates for every visible pixel, enabling direct extraction of 3D-2D ground control point correspondences from the best-matched candidate. Outlier correspondences are removed through Fundamental Matrix RANSAC, and spatially distributed representative points are selected via agglomerative clustering. Camera parameters are then estimated using an improved Perspective Projection Model with rotation matrix orthogonality constraints and weighted least squares adjustment. The recovered parameters are used to generate three-dimensional viewshed polygons. The method was tested on 41 CCTV cameras on a university campus and validated using reprojection error and ground-truth camera positions.

Volume estimation and accuracy assessment of unauthorised material deposits using airborne photogrammetry and laser scanning for environmental inspection

Markéta Potůčková¹, Alex Šrollerů¹, Martin Marko², Eva Štefanová¹

¹Charles University, Faculty of Science, Department of Applied Geoinformatics and Cartography, Albertov 6, Prague 2, Czechia; ²Czech Environmental Inspectorate, Na Břehu 267/1a, Prague 9, Czechia

Determining the volume of unauthorised stockpiles or material deposits is a common task for environmental inspection authorities. Although UAV photogrammetry and laser scanning are widely adopted in many fields today, their use within environmental inspectorate practice may still be limited in some countries. In addition, archives of aerial imagery and laser scanning data maintained by national mapping agencies offer valuable resources for retrospective analyses of terrain changes caused by unauthorised material deposits; however, their potential has not yet been fully realised.

The objectives of this study are to: (1) present a comparative analysis of UAV photogrammetry and laser scanning in relation to terrestrial GNSS measurements for determining the volume of larger stockpiles and apply a model for volume accuracy assessment; and (2) demonstrate both the potential and limitations of using archived aerial imagery and laser scanning data for retrospective terrain-change analysis, with a focus on estimating the thickness and volume of deposits and their accuracy.

Both objectives stem from the current need for environmental inspectorate. Volume estimation can be highly sensitive because of associated penalties; therefore, understanding the accuracy and limitations of the applied methods is crucial. When time constraints are not an issue and dense vegetation poses a challenge (including grass cover that cannot be penetrated by laser signals), terrestrial GNSS or traditional surveying remain the most reliable options. Nevertheless, airborne photogrammetry and laser scanning offer undeniable advantages in terms of operability and retrospective analysis.

Improved ICP Algorithm Constrained by Intensity Gradient for Urban Airborne Array InSAR Point Cloud Registration

Lijun Lu¹, Fangfang Ji², Shucheng Yang¹, Chunquan Cheng¹, Hanchao Zhang¹

¹State Key Laboratory of Spatial Datum, Chinese Academy of Surveying and Mapping; ²Capital Normal University,China, People's Republic of

Airborne array InSAR achieves high-precision three-dimensional reconstruction through multi-baseline interferometric height measurement, holding significant application value in urban spatial structure monitoring and surface deformation analysis. However, the acquired urban InSAR point clouds are often affected by multiple factors, including platform attitude errors, system calibration inaccuracies, and multi-angle imaging geometric discrepancies, leading to noticeable spatial biases among different datasets. To achieve geometric consistency across multi-baseline data, high-accuracy point cloud registration has become a crucial step in InSAR data fusion processing. Therefore, the research proposed an improved ICP Algorithm Constrained by Intensity Gradient for Urban Airborne Array InSAR Point Cloud Registration. The improved ICP algorithm constrained by intensity gradients, which integrates geometric and electromagnetic scattering features. Experimental results demonstrate that the proposed method exhibits superior robustness and registration performance in complex urban scattering environments, providing effective technical support for 3D reconstruction of SAR point clouds.

AiDroneTree: A Novel AI Deep Learning Based Network for Individual Tree Detection Using UAV-Derived Point Cloud in Dense Urban and Forest Landscapes

Sina Jarahizadeh, Bahram Salehi

State University of New York College of Environmental Science and Forestry, Department of Environmental Resources Engineering, 1 Forestry Dr., Syracuse, NY 13210 USA

Individual Tree Detection (ITD) is a primary step for estimating tree attributes such as spatial distribution, geometry, and species used in forest management, urban planning, and carbon accounting. While traditional field-based inventories are accurate, they are costly, labour-intensive, and limited in coverage. High-resolution UAV LiDAR offers a scalable alternative, and Deep learning (DL)-based object detection methods further enable automated ITD at large scales. In contrast to RGB imagery, UAV LiDAR can be transformed into multi-band representations that capture rich structural and textural information, which enhances ITD performance. However, previous methods still confront challenges presented by complex forest conditions, including overlapping crowns, and computational inefficiency when processing high-resolution, multi-band data. We propose AiDroneTree: a novel one-stage DL object-detection framework for multi-band rasterized UAV LiDAR, empowering more accurate and efficient tree detection in dense and heterogeneous forests to address this issue. The AiDroneTree architecture detects and segments the individual trees by combining a custom-built backbone and head optimized for detecting small trees in complex canopy environments with integration of Convolutional Blocks with Concatenate (CBC), LeakyReLU activations, and tunable layers throughout to detect bounding boxes and confidence scores for each tree. The results have been evaluated against YOLO on datasets captured from various environments with different tree shapes, sizes, and densities. The quantitative and qualitative results show that AiDroneTree outperforms YOLO in various forest conditions and achieves 91% accuracy, 93% precision, and 92% recall and F1-score.

Integrated MBES-based Assessment of Dam Tailrace Structure and Geomorphology

Sehoon Oh, Jiwan Hong, Geonu Park, Daegeon Woo, Joon Heo

Yonsei University, Korea, Republic of (South Korea)

The dam tailrace is a critical zone for dam safety, as high-energy spillway flows can deteriorate concrete slabs and drive scour along the downstream riverbed. However, this zone is difficult to access, and structural and geomorphic conditions are often assessed independently, limiting integrated understanding of their coupled behavior. Multibeam echo sounding (MBES) helps close this gap by providing high-resolution underwater topography and enabling simultaneous mapping of engineered concrete surfaces and erodible beds within a single survey. When deployed on unmanned surface vehicle (USV) platforms, MBES allows safe and efficient bathymetric mapping in narrow or high-energy downstream channels, supporting more complete characterization of tailrace conditions.

In this study, a USV-mounted MBES was used to acquire high-density underwater measurements across the tailrace of Daecheong Dam, capturing both the concrete stilling basin and the downstream alluvial bed. The resulting point cloud was segmented into two functional zones: (1) the concrete slab zone, where planar-deviation metrics quantified slab misalignment, elevation offsets, and localized deformations; and (2) the downstream zone, where terrain-based depression analysis delineated scour features and characterized their depth, extent, and morphology. By relating structural anomalies observed along the slab surface to the spatial distribution and severity of downstream scour, we perform a coupled slab–scour assessment that links block-level distress to localized erosion patterns near the apron-end transition.

This integrated approach demonstrates how MBES, combined with geospatial analysis, can support comprehensive underwater inspection and contribute to improved operational monitoring and hazard mitigation for large hydraulic structures.

High-detail 3D surveying and digital restoration of historical xylographic stamps: The Ulisse Aldrovandi case

Anna Forte, Maria Alessandra Tini, Valentina Girelli, Gabriele Bitelli, Luca Vittuari

University of Bologna, Dept. of Civil, Chemical, Environmental and Materials Engineering DICAM, Bologna, Italy

This contribution presents a digital workflow for the virtual restoration and functional recovery of a historic xylographic matrix created by the 16th-century naturalist Ulisse Aldrovandi and preserved at Palazzo Poggi, University of Bologna. Although not physically broken, the pearwood block had undergone subtle yet significant geometric deformation over the centuries, preventing it from producing a complete and accurate print.

The project employed high-resolution structured-light scanning to generate a detailed 3D model of the engraved surface, capturing its geometry with sub-millimetric accuracy. From the resulting 31-million-polygon mesh, approximately 7000 points corresponding to the peaks of the engravings were manually extracted and interpolated to model the deformation. A corrective digital transformation was then applied directly to the mesh vertices, restoring the planarity originally required for printing without altering the object itself.

This case study demonstrates the potential of integrating high-resolution 3D surveying and digital modelling to address subtle geometric deterioration in historical artefacts. The method offers a fully non-invasive and reversible approach that can be extended to other wooden matrices or similarly sensitive cultural heritage objects. Future work includes testing additional surveying techniques and evaluating the reproducibility of the proposed workflow across a wider set of materials and conditions.

Multi-class deterioration detection using data-centric approach from UAV-based bridge inspection applications

Ya-Li Lin¹, Jiann-Yeou Rau¹, Chao-Hung Lin¹, Wei-Shen Lai², Chih-Chao Hu²

¹National Cheng Kung University, Chinese Taipei; ²Institute of Transportation, Ministry of Transportation and Communications, Chinese Taipei

Modern AI applications increasingly rely on visual data for perception and decision-making, yet their reliability is fundamentally constrained by data quality and representativeness. Bridge inspection exemplifies this challenge: UAV imagery of bridge surfaces often exhibits complex textures, overlapping deterioration types, and severe class imbalance, limiting the performance of conventional deep models. To address these issues, this study proposes a data-centric approach within an integrated UAV-based bridge inspection framework. High-resolution UAV images are processed through photogrammetric calibration using Structure-from-Motion (SfM) and bundle adjustment, while a Swin-Unet segmentation model is trained with a data-centric sampling strategy that evaluates image patches through coverage, boundary, texture, and edge-entropy indicators to select representative samples. Experiments demonstrate that the proposed method achieves substantial improvements in mean IoU and F1-score compared with random cropping. The resulting multi-class deterioration maps are spatially integrated with 3D bridge models, forming a foundation for digital-twin-based inspection and confirming the effectiveness of data-centric optimization in enhancing the robustness of AI-driven infrastructure assessment.

DamViT: Vision Transformer–Based Robust Segmentation and 3D Mapping of Concrete Dam Damage from UAV Imagery

Jiwan Hong, Sehoon Oh, Joon Heo

Yonsei University, Korea, Republic of (South Korea)

Concrete dams require regular inspection because surface cracking and spalling can threaten durability and safety, yet UAV images of dam faces are often affected by low-light, blur, over-exposure, and stain-like discoloration that confuse automated crack segmentation. This contribution presents DamViT, a Vision Transformer–based framework for robust pixel-wise segmentation and 3D mapping of damage on concrete dams. UAV RGB images are annotated into three classes (background, crack, spalling) and used to train a SegFormer-based network equipped with two lightweight components: a degradation-aware module that estimates a per-pixel degradation map and guides feature extraction under low-quality imaging, and a stain-aware training strategy that explicitly balances stain-rich non-damage patches with damaged regions to reduce false positives on surface stains. The resulting three-class masks are back-projected onto a photogrammetrically reconstructed 3D dam mesh using camera poses and intrinsics, enabling computation of crack length, spalling area, and their spatial distribution in the structural coordinate system. The proposed pipeline links UAV imaging, robust segmentation, and quantitative 3D damage mapping to support dam safety management.

An end-to-end pipeline for 3D building modeling, texturing, and semantic integration from uav data

HyunSoo Kim¹, DinhMinh Bui¹, Ji Sang Park², Jun Su Kim³, Changjae Kim¹

¹Dept. of Civil and Enviromental Engineering, college of Engineering, MyongJi University, Republic of Korea; ²Principal Researcher, Mobility and Navigation Research Section, Electronics and Telecommunication Research Institute , Daejeon, Republic of Korea; ³AI Technology Team, Geostory Co., Republic of Korea

This study proposes an end-to-end automated pipeline for the generation, texturing, and semantic enhancement of 3D building models using UAV-based multi-source data, including imagery, image-derived point clouds, and orthophotos. The pipeline consists of three sequential stages: automatic 3D modeling, post-processing and texturing, and semantic integration. In the first stage, building candidates are automatically extracted from UAV-derived point clouds and orthophotos to generate geometric 3D models. The second stage refines the geometry through manual correction and applies texture mapping using UAV imagery and camera orientation parameters to enhance visual realism. In the third stage, façade images derived from building textures are processed through learning-based operators to detect semantic components such as windows. The detected 2D semantic information is converted into 3D coordinates and integrated into the textured 3D models, forming CityGML-like hierarchical structures within a .json framework. The resulting models contain both geometric and semantic information, offering high compatibility with CityGML and CityJSON standards. The proposed workflow demonstrates the potential for efficient, data-driven, and automated urban model generation that supports digital twin construction and spatial database updating. Future work will focus on incorporating LiDAR-based point clouds to further improve automation and semantic accuracy within the CityGML 3.0 framework.

Comparison of Crack Detection Performance According to Caustic Noise Removal Methods in Shallow-Water ROV Imagery

Daegeon Woo, Jiwan Hong, Changjoon Oh, Geonu Park, Sehoon Oh, Joon Heo

Yonsei University, Korea, Republic of (South Korea)

This contribution investigates how caustic noise—bright, wave-induced light patterns—affects crack detection performance in shallow-water ROV imagery acquired at Daecheong Dam. Although many studies address underwater challenges such as turbidity, color attenuation, and motion blur, the optical distortions caused by caustic flicker have received little attention, despite being one of the most dominant artifacts in the 0–3 m depth range. Using real ROV video frames, we generated paired datasets with and without caustic-removal preprocessing and evaluated their impact on two lightweight CNN-based crack detection models (YOLOv5 and a transfer-learning AlexNet variant). Four filtering strategies were tested, including physics-based temporal median and motion-compensated averaging, as well as learning-based DeepCaustics and an FFT-residual method adapted from RecGS. Experimental results show that caustic-removal preprocessing consistently reduces false positives and improves crack visibility under diverse lighting conditions. The findings demonstrate that caustic noise is a critical but often overlooked source of detection instability in shallow-water inspections. The study emphasizes the importance of integrating simple, unsupervised caustic-mitigation steps into ROV-based monitoring pipelines to enhance the reliability of underwater infrastructure assessment.

Efficient Boundary Refinement for Classification of MMS Point Clouds

Makoto Nakano¹, Keita Hiraoka¹, Genki Takahashi², Hiroshi Masuda¹

¹The University of Electro-Communications, Japan; ²Kokusai Kogyo Co., Ltd., Japan

Mobile Mapping Systems (MMS) provide dense point clouds essential for 3D mapping and infrastructure management, where semantic labeling is required to segment points into meaningful objects. Previous studies have shown that multiscale geometric features effectively capture local context for this task. Building on our previous work using multiscale features with efficient two-stage neighborhood search, we applied Contrastive Boundary Learning (CBL) to enhance classification accuracy near object boundaries. While CBL significantly improved boundary recognition, it also increased computational cost compared to Random Forest–based segmentation, limiting its practicality for large-scale datasets. In this study, we analyze the trade-off between segmentation accuracy and inference time in CBL-based boundary refinement. We further explore strategies to reduce computation while maintaining sufficient accuracy, aiming to achieve an optimal balance for practical MMS point cloud processing.

Reconstruction and Evolution Simulation of Ancient Road Networks in the Yuncheng Region Based on Multi-Modal Data Fusion

Jingjue Jia¹, Mingyi Du¹, Qiang Chen¹, Zhenhua Gao²

¹Beijing University of Civil Engineering and Architecture, China, People's Republic of; ²Shanxi Provincial Research Institute of Archaeology,China, People's Republic of

Ancient transport networks are central to studies of historical geography, regional socio-economic systems, and human mobility patterns. Traditional network reconstruction has relied primarily on the Least-Cost Path (LCP) model; however, the LCP’s “single-optimal” assumption is overly simplistic and cannot capture common historical realities such as the coexistence of multiple routes. Although probabilistic approaches such as Circuit Theory (CT) and behaviorally explicit methods such as Agent-Based Modeling (ABM) have been developed, a systematic, integrated framework that combines these approaches remains underdeveloped. Using the Yuncheng area of Shanxi Province as a case study, this paper systematically compares and integrates three distinct network models by constructing LCP, CT, and ABM networks and quantitatively comparing their differences in path morphology and predictive logic. The resulting multimodal, integrated probabilistic road network synthesizes the strengths of the three approaches and provides precise, high-confidence target areas for archaeological survey.

Assessing Stream Morphology Using High Resolution and Thermal UAV Imagery

Anastasia Umstott, Yanli Zhang, Carmen Montana Schalk

Stephen F Austin State University, United States of America

To protect and promote fish resources, fish habitat needs to be assessed and establish a “standard” for good or poor habitat for specific fish species. For this study, High resolution UAV images, including thermal image, are collected with an Anzu Raptor T for selected streams in East Texas. Orthomosaic and classification analysis were performed to make accurate map to represent open water, channel substrate and riparian vegetation. This approach provides a rapid means to assess streams. Future efforts will target finer geomorphic unit classifications (e.g., pool, riffle, run) across multiple river systems. This information can be critical for freshwater habitat management and restoration.

Road marking condition assessment from drone imagery via detector-guided segmentation and gaussian mixture damage modeling

Dinh Minh Bui, JuBin Lee, HyunSoo Kim, SoMin Han, ChangJae Kim

Department of Civil and Environmental Engineering, College of Engineering, Myongji University.

Road marking condition assessment is essential for transportation safety and road asset management, yet conventional inspection methods remain labor-intensive and inefficient. This study proposes an automated workflow for assessing road-marking conditions from drone imagery by combining object detection with a detector-guided segmentation strategy. First, road-marking regions are localized through a lightweight detector optimized for aerial viewpoints. The detected regions are then refined using a segmentation module that produces pixel-accurate masks, enabling reliable extraction of surface-level deterioration such as fading, cracking, and structural discontinuities.

The proposed approach was evaluated on drone datasets collected under varying flight altitudes and illumination conditions. Experimental results indicate that detector-guided segmentation significantly improves robustness to background clutter and enhances segmentation accuracy compared to single-stage models. The method also supports quantitative condition scoring, making it suitable for integration into municipal inspection workflows.

This contribution demonstrates the potential of combining detection and segmentation for large-scale, drone-based road-marking assessment, offering a practical solution for automated infrastructure monitoring.

Quantitative Analysis of LiDAR Accuracy for Mapping Applications

Ahmed Elaksher¹, Tarig Ali², Abdullatif Alharthy³

¹NMSU, United States of America; ²American University of Sharjah, UAE; ³Ministry of National Guard, KSA

Airborne laser scanning (LiDAR) technology has demonstrated exceptional capability in rapidly capturing dense point clouds and accurately representing complex surface features. It has been successfully applied across numerous geospatial and engineering disciplines with highly promising outcomes. The accuracy of any derived product inherently depends on the quality of the original LiDAR data and the processing methods employed. Therefore, evaluating data quality is an essential prerequisite for reliable analysis and application.

This study presents a quantitative assessment of LiDAR system performance, focusing on the intrinsic accuracy of the laser measurements themselves—an aspect often underexplored in existing literature. The evaluation was conducted through detailed field surveying using GPS triangulation and leveling techniques. Results reveal both planimetric and vertical accuracy characteristics, with a total elevation discrepancy of approximately 0.12 m and a horizontal RMSE near 0.50 m. The identified discrepancies exhibit two distinct components: a short-period random variation associated with the LiDAR ranging system, and a lower-frequency component influenced by biases in the geopositioning subsystem.

Image-assisted aerial LiDAR completion with morphology-guided gaussian splatting

Siyuan Zou¹, Yongjun Zhang², Zhiwei Li¹, Hongbo Pan¹, Xinyi Liu², Haojun Tang¹, Hai Kan³

¹School of Geoscience and Info-Physics, Central South University; ²School of Remote Sensing and Information Engineering, Wuhan University; ³School of Resource and Environmental Sciences, Wuhan University

Airborne LiDAR offers high geometric accuracy and efficient wide-area coverage, and has been widely used in applications such as urban 3D reconstruction, forestry inventory, topographic mapping, and powerline extraction . However, due to near-nadir acquisition geometry and occlusions, vertical structures such as building façades are often under-sampled, resulting in large voids in the point cloud . Traditional geometric hole-filling methods, including Moving Least Squares, Poisson surface reconstruction, and mesh repair, are effective for small gaps, but they often suffer from over-smoothing, structural distortion, and topological discontinuities when applied to large-scale missing regions.

Meanwhile, multi-view imagery can recover continuous surfaces through dense matching or Gaussian Splatting, but the reconstruction quality still depends heavily on the completeness of the initial geometry. When the initial triangulated points or geometric priors are incomplete, façade regions remain prone to fragmentation and noise This paper proposes an image-assisted LiDAR completion framework that models LiDAR completion as continuous surface reconstruction with explicit Gaussians. Through anisotropic Gaussian initialization and tangent-plane-guided densification, the method preserves façade geometry and improves the completeness and accuracy of LiDAR-image fusion reconstruction.