JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at isprs2026@icsevents.com.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Daily Overview

Location: 716A
175 theatre

Date: Sunday, 05-July-2026

8:30am - 12:00pm

TuT16: Metrics That Make a Difference: How to analyze change and error
Location: 716A

12:00pm - 1:15pm

ThS26: Earth Observation Foundation Models: Scalable, Multimodal AI for Environmental Intelligence
Location: 716A

12:00pm - 12:15pm

From Orthophotos to Insights: AI-Powered Forest Monitoring for Digital Forest Twin

Nina Krueger¹, Sandra Uhlig², Taimur Khan³

¹M.O.S.S. Computer Grafik Systeme GmbH, Germany; ²Landesamt für Geobasisinformation Sachsen (GeoSN); ³Helmholtz Center for Environmental Research (UFZ)

This project, a collaboration between the Landesamt für Geobasisinformation Sachsen (GeoSN) and M.O.S.S. Computer Grafik Systeme GmbH, pioneers the development of a Digital Twin Forest prototype for Saxony. The initiative leverages high-resolution aerial orthophotos (DOP) and advanced AI methods to generate detailed, current forest information. The core methodology centers on the “DeepTrees” workflow, a convolutional neural network (CNN)–based approach developed by the Helmholtz Center for Environmental Research (UFZ). This workflow processes DOP imagery at 10–20 cm resolution to segment individual tree crowns and extract key forest indicators, including crown area, crown radius, and tree density.

The process unfolds in three main stages: (1) preprocessing and model adaptation using transfer learning, (2) inference and postprocessing for accurate tree segmentation, and (3) integration into GeoSN’s data infrastructure. This integration utilizes OGC-compliant services and moGI-based data management, enabling automated processing, configuration, and visualization.

Results from the prototype confirm the feasibility of precise, large-scale tree crown segmentation from aerial imagery. The system also demonstrates the potential to derive temporal and structural forest information from recurring DOP datasets. These outputs can be directly incorporated into operational geospatial systems, supporting climate adaptation, forest management, and policy-making.

In conclusion, the Project showcases how explainable, interoperable AI workflows can strengthen national geodata infrastructures and serve as a model for future federated, AI-driven digital forest twins across Germany.

12:15pm - 12:30pm

Scalable Framework for Peatland Aboveground Biomass Mapping Using Multi-source Satellite Data and Machine Learning

Mohammadali Hemati¹, Masoud Mahdianpari^1,2, Hodjat Shiri³, Fariba Mohammadimanesh⁴

¹Department of Electrical and Computer Engineering, Faculty of Engineering and Applied Sciences, Memorial University of Newfoundland; ²C-CORE; ³Civil Engineering Department, Faculty of Engineering and Applied Sciences, Memorial University of Newfoundland; ⁴Canada Centre for Remote Sensing, Natural Resources Canada

This study presents a scalable framework for mapping aboveground biomass and moisture content in peatlands using intensive field sampling, multi-sensor satellite imagery, and advanced machine learning. Field data collected from diverse bog and fen sites in Western Newfoundland are integrated with Sentinel-1/-2 synthetic aperture radar and optical data, complemented by 3 m PlanetScope imagery for site-level detail. Ensemble learning models, particularly XGBoost, yield high biomass mapping accuracy, with regional maps revealing major biogeographical gradients and fine-scale site mosaics. Feature importance analysis highlights the role of red-edge and SAR bands in prediction. The results demonstrate that free satellite archives and machine learning can overcome limitations of costly airborne campaigns, supporting operational carbon monitoring and ecological management in northern peatlands. This approach establishes a foundation for wide-area wetland monitoring and future expansion using emerging remote sensing technologies.

12:30pm - 12:45pm

A self-supervised method for soil moisture estimation using multisensor data over forests

Rouhollah Esmaeilisarteshniz¹, Ramata Magagi¹, Samuel Foucher¹, Aaron Berg², Andreas Colliander³

¹Centre d'applications et de recherches en télédétection (CARTEL), Université de Sherbrooke, Québec, Canada; ²Department of Geography, Environment and Geomatics, University of Guelph, Ontario, Canada; ³Finnish Meteorological Institute, Helsinki 00560, Finland

Surface soil moisture (SM) plays a significant role in environmental and hydrological processes, particularly runoff and evapotranspiration. Within forest ecosystems, changes in SM can lead to significant ecological impacts, including paludification and greater susceptibility to forest fires. Microwave remote sensing facilitates large-scale monitoring of SM. Moreover, machine learning (ML) have demonstrated strong potential for capturing the nonlinear relationships between SM and satellite data. In general, supervised ML techniques achieve higher success rates when trained on larger ground measurements. However, obtaining extensive ground measurements of SM over vast areas such as forests is challenging, expensive, and time-consuming. To address this limitation, this study proposes a self-supervised method based on pre-task learning to estimate SM over forested areas using multisensor data. The core idea of the self-supervised approach is to leverage the knowledge gained during pre-task learning from multisensor data and transfer it to the SM estimation task, thereby improving the model’s generalization ability to SM estimation. The self-supervised learning method achieved an overall coefficient of determination (r²) of 0.74 and an RMSE of 0.04 m³/m³ on the testing dataset By focusing on each forest site, the model obtained r² = 0.75 with RMSE = 0.04 m³/m³ at Millbrook, r² = 0.63 with RMSE = 0.04 m³/m³ at Massachusetts, and r² = 0.74 with RMSE = 0.03 m³/m³ at Saskatchewan. The results highlight the potential of multisensor data for SM estimation in forested areas. Our method, which utilizes self-training on the input data, reduces dependence on ground SM measurements and enhances generalization capability.

12:45pm - 1:00pm

Zero-shot multi-class semantic segmentation of remote sensing images using SAM 2 with prior database information

Paula Lilian Lippmann, Mareike Dorozynski, Franz Rottensteiner, Christian Heipke

Institute of Photogrammetry and GeoInformation - Leibniz University Hannover, Germany

Land cover data need to be updated regularly. Typically, remote sensing images (RSI) play a central role in this process. A first step is RSI semantic segmentation. Today, this task is mainly solved by deep learning. Especially vision foundation models (VFM) have gained increasing importance in this context. Having been trained on large datasets, VFM for segmentation can yield good results on data from various domains without further training. We present a new method for using the VFM Segment Anything Model 2 (SAM 2) for multi-class semantic segmentation of Sentinel-2 images that does not require training data. Our method is based on a prompt engineering approach, using SAM 2 in its pre-trained form. The different prompt types are generated on the basis of existing topographic data. We also propose a post-processing step for merging the output of SAM 2 to obtain a multi-class label image. The results of our experiments show that our method achieves an overall accuracy (OA) of up to 93% at pixel-level using mask prompts. Experiments with other Sentinel-2 3-channel composite images do not show significantly different results compared to R-G-B images. Incorporating data from different time steps, intended to be used for map updating, shows good results. But the small amount of changed areas indicate limitations. In general, the proposed method is suitable for further

research into semantic segmentation tasks with little or no training data, as well as for the process of updating databases.

1:30pm - 2:45pm

ThS23A: Towards Large Cultural Heritage Foundation Models: Datasets, Semantic Alignment, and Component-Level Annotation
Location: 716A

1:30pm - 1:45pm

Investigating The Form And Restoration Of The Diji Altar

Wang Jinghan, Qi Ying, Hou Miaole

Beijing University of Civil Engineering and Architecture, China, People's Republic of

The restoration of historic buildings is an important topic in today's society and constitutes the primary subject of this study. The Diji Altar, located along the central axis of Beijing, is not only a significant historical landmark but also an important remnant of China's ancient imperial sacrificial architecture. Although some studies have focused on the Diqi Altar, such as its ritual hierarchy and craftsmanship as recorded in historical texts, certain research gaps remain. Due to the damage to the altar structure and insufficient documentation in relevant literature regarding its structural form, platform base specifications, and stylistic evidence, systematic research on restoration techniques remains relatively scarce. There is a need to reconstruct evidence based on architectural principles. Addressing this critical gap is of great significance for understanding the technical achievements and ceremonial principles of official architecture during the Ming and Qing dynasties, and for guiding the restoration and preservation of ancient buildings.

1:45pm - 2:00pm

A Digital Restoration Method for Earth God Altars from Discrete Components to Scene Reconstruction

Sining Li¹, Nan Meng^1,2, Tao Zhang^1,3, Lili Jiang^1,4, Miaole Hou^1,5

¹Beijing University of Civil Engineering and Architecture, China, People's Republic of; ²Ancient Chinese Architecture Museum,China,People's Republic of; ³Beijing Institute of Archaeology,China,People's Republic of; ⁴Beijing Digsur Science & Technology Com. Ltd,China,People's Republic of; ⁵Beijing University of Civil Engineering and Architecture, China, People's Republic of

The digital preservation of open-air sites often faces multiple challenges, such as dispersed components, varied forms, and missing historical records. In response, this study focuses on the Beijing Dizhitan and proposes and implements an innovative workflow that deeply integrates architectural morphology theories, archaeological typological methods, and modern digital technologies. This workflow systematically constructs a complete methodological chain, from the semantic annotation, classification, and virtual assembly of stone components, to the virtual restoration and model reconstruction of the site, ultimately achieving scenario-level restoration and display evaluation. The successful restoration of the Dizhitan demonstrates that this approach not only effectively "revives" dispersed components, placing them in their proper positions in a virtual space, but also pioneers a replicable new paradigm that embeds rigorous academic research throughout the digital process. This provides an entirely new technical approach and perspective for the preservation, study, and interpretation of immovable open-air cultural relics.

2:00pm - 2:15pm

Building a Multimodal Dataset of Rock Art: Integrating Text, Images, and 3D Point Clouds

Dongxu Huo, Chenxu Nie, Miaole Hou

Chang'an University, China, People's Republic of

This paper addresses the limitations of single-modal data in rock art cultural heritage preservation, such as incomplete information and fragmented semantics. It proposes a method for constructing a multimodal dataset that integrates text, images, and 3D point clouds. Text data is structured and semantically annotated using the ArchaeoBERT model; image data is obtained through web scraping, annotation, and augmentation; and point cloud data is captured using laser scanning, noise reduction, and registration techniques. Feature mapping alignment is employed, combining CNN, BERT, and PointNet++ to extract features and generate unified vector representations. Through a three-level quality control process, the data is accurate and reliable, with information coverage increased by 47.3%. This dataset achieves comprehensive integration of semantic, visual, and spatial information, providing a multidimensional data foundation and practical reference for the digital preservation, 3D reconstruction, and cross-modal retrieval of rock art.

2:15pm - 2:30pm

Monocular Depth Estimation from UAV images for 3D documentation of architectural heritage: a Depth Anything v2-based approach

Andrea Maria Lingua¹, Filiberto Chiabrando², Francesca Gallitto¹, Stefania Manca¹, Alessio Martino², Francesca Matrone¹, Alessandra Spadaro¹

¹Politecnico di Torino (DIATI), Italy; ²Politecnico di Torino (DAD), Italy

The rapid evolution of Monocular Depth Estimation (MDE) models — and in particular the emergence of recent foundation models such as Depth Anything v2 (Yang et al., 2024; Ranftl et al., 2022) — is opening concrete perspectives for the application of artificial intelligence in architectural and cultural heritage surveying.

This research aims to assess the feasibility of employing such models to obtain metric depth estimations from UAV imagery, acquired in both oblique and nadir views, with the broader goal of integrating neural networks into 3D documentation, HBIM, and GIS workflows for built heritage.

The Depth Anything v2 models were trained initially for ground-level scenarios, where the camera typically operates 1–2 m above the ground, with horizon distances extending up to 60–80 m. When applied to aerial imagery, particularly drone-based acquisitions, this results in a substantial domain gap: the network tends to interpret top-down landscapes as distant horizons, thereby compressing the depth scale.

To address this issue, this study develops an experimental calibration and adaptation procedure aimed at transforming the depth maps produced by the model into metrically consistent estimates that are coherent with architectural reality.