JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at isprs2026@icsevents.com.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Agenda Overview

Session

ThS23A: Towards Large Cultural Heritage Foundation Models: Datasets, Semantic Alignment, and Component-Level Annotation

Time:

Sunday, 05-July-2026:

1:30pm - 2:45pm

Location: 716A

175 theatre

Session Topics:

Towards Large Cultural Heritage Foundation Models: Datasets, Semantic Alignment, and Component-Level Annotation (ThS23)

Presentations

1:30pm - 1:45pm

Investigating The Form And Restoration Of The Diji Altar

Wang Jinghan, Qi Ying, Hou Miaole

Beijing University of Civil Engineering and Architecture, China, People's Republic of

The restoration of historic buildings is an important topic in today's society and constitutes the primary subject of this study. The Diji Altar, located along the central axis of Beijing, is not only a significant historical landmark but also an important remnant of China's ancient imperial sacrificial architecture. Although some studies have focused on the Diqi Altar, such as its ritual hierarchy and craftsmanship as recorded in historical texts, certain research gaps remain. Due to the damage to the altar structure and insufficient documentation in relevant literature regarding its structural form, platform base specifications, and stylistic evidence, systematic research on restoration techniques remains relatively scarce. There is a need to reconstruct evidence based on architectural principles. Addressing this critical gap is of great significance for understanding the technical achievements and ceremonial principles of official architecture during the Ming and Qing dynasties, and for guiding the restoration and preservation of ancient buildings.

1:45pm - 2:00pm

A Digital Restoration Method for Earth God Altars from Discrete Components to Scene Reconstruction

Sining Li¹, Nan Meng^1,2, Tao Zhang^1,3, Lili Jiang^1,4, Miaole Hou^1,5

¹Beijing University of Civil Engineering and Architecture, China, People's Republic of; ²Ancient Chinese Architecture Museum,China,People's Republic of; ³Beijing Institute of Archaeology,China,People's Republic of; ⁴Beijing Digsur Science & Technology Com. Ltd,China,People's Republic of; ⁵Beijing University of Civil Engineering and Architecture, China, People's Republic of

The digital preservation of open-air sites often faces multiple challenges, such as dispersed components, varied forms, and missing historical records. In response, this study focuses on the Beijing Dizhitan and proposes and implements an innovative workflow that deeply integrates architectural morphology theories, archaeological typological methods, and modern digital technologies. This workflow systematically constructs a complete methodological chain, from the semantic annotation, classification, and virtual assembly of stone components, to the virtual restoration and model reconstruction of the site, ultimately achieving scenario-level restoration and display evaluation. The successful restoration of the Dizhitan demonstrates that this approach not only effectively "revives" dispersed components, placing them in their proper positions in a virtual space, but also pioneers a replicable new paradigm that embeds rigorous academic research throughout the digital process. This provides an entirely new technical approach and perspective for the preservation, study, and interpretation of immovable open-air cultural relics.

2:00pm - 2:15pm

Building a Multimodal Dataset of Rock Art: Integrating Text, Images, and 3D Point Clouds

Dongxu Huo, Chenxu Nie, Miaole Hou

Chang'an University, China, People's Republic of

This paper addresses the limitations of single-modal data in rock art cultural heritage preservation, such as incomplete information and fragmented semantics. It proposes a method for constructing a multimodal dataset that integrates text, images, and 3D point clouds. Text data is structured and semantically annotated using the ArchaeoBERT model; image data is obtained through web scraping, annotation, and augmentation; and point cloud data is captured using laser scanning, noise reduction, and registration techniques. Feature mapping alignment is employed, combining CNN, BERT, and PointNet++ to extract features and generate unified vector representations. Through a three-level quality control process, the data is accurate and reliable, with information coverage increased by 47.3%. This dataset achieves comprehensive integration of semantic, visual, and spatial information, providing a multidimensional data foundation and practical reference for the digital preservation, 3D reconstruction, and cross-modal retrieval of rock art.

2:15pm - 2:30pm

Monocular Depth Estimation from UAV images for 3D documentation of architectural heritage: a Depth Anything v2-based approach

Andrea Maria Lingua¹, Filiberto Chiabrando², Francesca Gallitto¹, Stefania Manca¹, Alessio Martino², Francesca Matrone¹, Alessandra Spadaro¹

¹Politecnico di Torino (DIATI), Italy; ²Politecnico di Torino (DAD), Italy

The rapid evolution of Monocular Depth Estimation (MDE) models — and in particular the emergence of recent foundation models such as Depth Anything v2 (Yang et al., 2024; Ranftl et al., 2022) — is opening concrete perspectives for the application of artificial intelligence in architectural and cultural heritage surveying.

This research aims to assess the feasibility of employing such models to obtain metric depth estimations from UAV imagery, acquired in both oblique and nadir views, with the broader goal of integrating neural networks into 3D documentation, HBIM, and GIS workflows for built heritage.

The Depth Anything v2 models were trained initially for ground-level scenarios, where the camera typically operates 1–2 m above the ground, with horizon distances extending up to 60–80 m. When applied to aerial imagery, particularly drone-based acquisitions, this results in a substantial domain gap: the network tends to interpret top-down landscapes as distant horizons, thereby compressing the depth scale.

To address this issue, this study develops an experimental calibration and adaptation procedure aimed at transforming the depth maps produced by the model into metrically consistent estimates that are coherent with architectural reality.