JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at isprs2026@icsevents.com.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Daily Overview

Location: 713A
125 theatre

Date: Saturday, 11-July-2026

8:30am - 10:00am

ThS21: The Global-local Exchange Loop: Coupling Earth Observation and Citizen Sciences for LCLU Mapping
Location: 713A

8:30am - 8:45am

OntoLULC-SOTA: An ontology based approach to make systematic reviews for LULC data

Martin Cubaud¹, Ana-Maria Olteanu-Raimond¹, Cidalia C. Fonte^2,3, Diogo Duarte^2,4, Jacinto Estima⁵, Linda See⁶, Nicolas Gonthier¹, Laurence Jolivet¹, Clément Mallet¹, Arnaud Le Bris¹, Vyron Antoniou⁷

¹Univ Gustave Eiffel, Géodata Paris, IGN, LASTIG, F-77454 Marne-la-Vallée, France; ²Institute for Systems Engineering and Computers at Coimbra (INESC Coimbra), 3030-290 Coimbra, Portugal; ³University of Coimbra, Department of Mathematics, Apartado 3008, EC Santa Cruz, 3001-501 Coimbra, Portugal; ⁴Department of Electrical and Computer Engineering, Polo 2, 3030-290 Coimbra, Portugal; ⁵University of Coimbra, CISUC, Department of Informatics Engineering, Rua Sílvio Lima, 3030-290 Coimbra, Portugal; ⁶International Institute for Applied Systems Analysis (IIASA), Laxenburg, Austria; ⁷Hellenic Army Geographical Directorate, 15561 Cholargos, Greece

Land Use (LU) and Land Cover (LC) data allow us to understand the physical and human activities associated with a given land. Thus, LULC is a dynamic and highly researched field. LULC review papers are numerous and provide high-level insights about the proposed approaches, the data used, the study cases, the strengths and limitations, and the identification of new research gaps. Nevertheless, these reviews are not systematic and reproducible. The goal of this work is to propose an ontology to help the research community conduct systematic and shareable literature reviews and comparable analytical analyses of scientific papers. To achieve this, we formalize their metadata, content, strengths, and weaknesses. In particular, we consider the scientific paper as the central element of our ontology and we define formal semantics for all relevant items (data process, LULC life cycle and scientific paper). We hope to open the path to more efficient synthesis, discovery, and reuse of research outcomes from the literature. To facilitate the instantiation process and make it accessible to a broader range of researchers, we designed a tabular-based template. We used our template to simulate the process of conducting a literature review on three use cases: building function, global land cover mapping, and multi-class change detection.

8:45am - 9:00am

Manual Annotations meet Fine-Tuned Foundation Models: a Comparison on Tree Crown Segmentation Task

Rewanth Ravindran, Janik Steier, Samer Karam, Dorota Iwaszczuk

Technical University of Darmstadt, Germany

Accurate segmentation of individual tree crowns (ITCs) from remote-sensing imagery is essential for forest monitoring and ecological analysis, yet remains challenging due to overlapping canopies and structural variability. The Segment Anything Model (SAM) shows strong generalization capabilities but requires effective prompting and domain adaptation for remote sensing applications. In this study, we investigate a lightweight fine-tuning strategy using Low-Rank Adaptation (LoRA) to adapt SAM for ITC segmentation on the BAMFORESTS dataset. The impact of different prompting strategies is evaluated, including manually annotated point and bounding box prompts, as well as automatically generated bounding boxes derived from a pre-trained tree detector. SAM is fine-tuned with instance-level ITC masks, enabling prompt-aware segmentation of multiple tree crowns per image. Performance is assessed before and after fine-tuning using standard instance segmentation metrics, including IoU and F1-score. Results show that LoRA-based adaptation improves mask delineation and robustness to prompt variability, with bounding box prompts consistently outperforming point-based inputs. Automatically generated prompts enable a fully automated workflow, although their effectiveness depends on detection quality. Evaluation on an independent validation site with manually annotated ITC labels shows that the fine-tuned LoRA-SAM model achieves performance comparable to manual annotations, while significantly reducing annotation effort. These findings highlight the importance of prompt design in adapting foundation models for remote sensing tasks and demonstrate that parameter-efficient fine-tuning provides a practical pathway toward scalable ITC segmentation.

9:00am - 9:15am

Evaluation of the IGN FLAIR-HUB Model Transferability Performance for Land Cover Mapping in Iasi, Romania

Ana-Maria Loghin¹, Loredana-Mariana Crenganis¹, Constantin Stoian², Ana-Maria Olteanu-Raimond², Anatol Garioud², Valeria-Ersilia Oniga¹, Bogdan Rusu¹

¹quot;Gheorghe Asachi" Technical University of Iasi, Romania; ²Univ. Gustave Eiffel, IGN-ENSG, LaSTIG – Saint-Mande, France

This research rigorously evaluates the transferability of the pre-trained FLAIR-HUB deep learning model, developed by the French National Institute of Geographical and Forest Information (IGN), in terms of spatial generalizability and multi-resolution robustness, when transferred from its native French domains to the complex urban-agricultural landscape of Iasi, Romania.

The core objective of this investigation is to test the model's performance stability across severe multi-resolution domain shifts and temporal scenarios. The model architecture is applied to orthophotos acquired over Iasi in 2019 (at 0.5 m resolution) and 2024 (at 0.2 m and at a very high resolution of 0.084 m), enabling a comprehensive assessment of cross-resolution and temporal robustness.

A novel validation framework is introduced, combining conventional 2D raster-based evaluation with a 3D point-wise assessment using semantically labeled UAV-derived point clouds. The results demonstrate strong performance for dominant classes such as buildings and herbaceous vegetation, with improved accuracy at higher spatial resolution, while stable classes such as buildings and impervious surfaces show a comparatively robust performance, confirming the model’s capability to consistently represent invariant land cover types. However, performance decreases for heterogeneous and vegetation-related classes due to seasonal variability and class complexity. The 3D validation reveals slightly lower but consistent results, highlighting its role as a more rigorous evaluation approach. Overall, the study confirms the potential of transferring pre-trained semantic segmentation models to new geographic contexts, while emphasizing the importance of spatial resolution, temporal consistency, and validation strategy.

9:15am - 9:30am

Towards efficient Giant Tree Inventories: Deep Learning with crowdsourced Training Data

Yu-Hui Wang¹, Chi-Kuei Wang¹, Chung-Cheng Lee¹, Rebecca Chia-Chun Hsu²

¹Dept. of Geomatics, National Cheng Kung University, Chinese Taipei; ²Forest Ecology Division, Taiwan Forestry Research Institute, Chinese Taipei

Airborne Laser Scanning (ALS) data have been used to identify giant trees in Taiwan, yet current workflow included volunteers to visually inspect ALS profile images. This study proposed to replace the volunteer-based verification step by applying deep learning to ALS profile images. Candidate treetop locations were first extracted from a Canopy Height Model (CHM) using a 65 m threshold and local maxima filtering. For each candidate, a representative ALS profile image was generated following an automated angle-selection method based on terrain fitting.

An EfficientNetV2-S model was trained using volunteer-labelled profile images from previous nationwide surveys. After label cleaning, a refined dataset was constructed, and a hybrid resampling strategy was applied to address class imbalance. The final model achieved 99.0% overall accuracy, 98.1% precision, and 100% recall on the independent test set, successfully detecting every true giant tree.

To evaluate generalization, the model was applied to 97,487 candidates from the latest national ALS survey. Predictions exhibited a strongly bimodal confidence distribution, demonstrating stable between true and false positives and effectively reducing the manual inspection workload.

This study shows that deep learning can reliably replace crowdsourced verification, enabling scalable, supporting efficient updates of large-scale forest inventories.

9:30am - 9:45am

The Global-Local loop: what is missing in bridging the gap between geospatial data from numerous communities ?

Clément Mallet, Ana-Maria Olteanu-Raimond

Univ Gustave Eiffel, IGN, Géodata Paris, LASTIG, France

We face a unprecedented amount of geospatial data, describing directly or indirectly the Earth Surface at multiple spatial, temporal, and semantic scales, and stemming from numerous contributors, from satellites to citizens. The main challenge in all the geospatial-related communities lies in suitably leveraging a combination of some of the sources for either a generic or a thematic application. Certain data fusion schemes are predominantly exploited: they correspond to popular tasks with mainstream data sources, e.g., free archives of Sentinel images coupled with OpenStreetMap data under an open and widespread deep-learning backbone for land-cover mapping purposes. Most of these approaches unfortunately operate under a "master-slave" paradigm, where one source is basically integrated to help processing the "main" source, without mutual advantages (e.g., large-scale estimation of a given biophysical variable using in-situ observations) and under a specific community bias.

We argue that numerous key data fusion configurations, and in particular the effort in symmetrizing the exploitation of multiple data sources, are insufficiently addressed while being highly beneficial for generic or thematic applications. Bridges and retroactions between scales, communities and their respective sources are lacking, neglecting the utmost potential of such a "global-local loop". In this paper, we propose to establish the most relevant interaction schemes through illustrative use cases. We subsequently discuss under-explored research directions that could take advantage of leveraging available data through multiples scales and communities.

10:30am - 12:00pm

WG III/1G: Remote Sensing Data Processing and Understanding
Location: 713A

10:30am - 10:45am

YOLOv8m-CCFM-GSConv: Research on Lightweight Marine Oil Spill Target Detection Based on Improved YOLOv8m Model

Junjie Lu¹, Qingyang Wang^1,2,3, Bo Song¹, Jianwu Jiang^1,2,3, Bin Yang¹, Chen Jiao¹

¹College of Geomatics and Geoinformation, Guilin University of Technology, Guilin 541004, China; ²Guangxi Ecological Spatiotemporal Big Data Perception Service Laboratory, Guilin 541004, China; ³Guangxi Key Laboratory of Spatial Information and Geomatics, Guilin 541004, China

In the application of target detection for marine oil spills, deep learning methods are gradually replacing traditional remote sensing image recognition approaches. While complex models designed for higher accuracy may compromise recognition speed, they often fail to meet the rapid response requirements of terminal device applications (Chai et al, 2025). Therefore, developing a lightweight detection model that balances high accuracy and real-time performance is crucial for enhancing marine oil spill emergency response capabilities (Liang et al, 2024). Based on the yolov8m model, this study introduces GSConv (Li et al, 2024) lightweight convolution and CCFM (Guo et al, 2025) cross-scale feature fusion module, which significantly improves the adaptability of multi-scale target detection and recognition accuracy in complex backgrounds while maintaining model lightweightness, thereby offering a novel and effective solution for marine oil spill target detection.

10:45am - 11:00am

Detecting moving vehicles on Sentinel-2 imagery using semi-automatic labeling from S2A/S2C tandem phase

Guillaume Buthmann¹, Florentin Poucin¹, Jérémy Anger^1,2

¹Kayrros SAS; ²Université Paris-Saclay, ENS Paris-Saclay, CNRS, Centre Borelli, 91190, Gif-sur-Yvette, France

During the commissioning phase of ESA's Sentinel-2C, tandem images with Sentinel-2A were acquired with a delay of 30 seconds. We present a novel, automated method for labeling moving vehicles in Sentinel-2 images, leveraging the temporal offset between these tandem acquisitions. We propose a filtering process that isolates pixels corresponding to vehicles that moved between the two acquisitions. We generate a training dataset based on this process, removing the need for a large manual labeling phase. The dataset is used to train a standard deep-learning-based vehicle detection model. Experimental results, as well as a validation study using ground-truth data from California, highlight the quality of the proposed labeling method, and show that a vehicle detection model can be successfully trained from quasi-simultaneous acquisitions.

11:00am - 11:15am

LAD-Enhancer: A Lightweight All in One Aerial Detection Enhancer Under Adverse Weather

Yu Wan¹, Jie Li¹, Liupeng Lin², Zaiyan Zhang¹, Qiangqiang Yuan¹, Huanfeng Shen²

¹School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China; ²School of Resource and Environmental Sciences, Wuhan University, Wuhan 430079, China

With the rapid development of aerial imaging technology, aerial target detection has become a research hotspot with broad applications in intelligent transportation, agricultural monitoring, and military surveillance. However, the performance of aerial detection models is often degraded under adverse weather conditions such as fog, sandstorms, and low illumination. In such environments, aerial images typically suffer from reduced contrast and color distortion, which significantly affects the model’s ability to accurately identify targets. To this end, a Lightweight All-in-One Aerial Detection Enhancer Under Adverse Weather (LAD-Enhancer) has been proposed. The designed enhancer processes and restores degraded aerial images, thereby enhancing the detection model’s ability to perceive potential targets. Unlike conventional image restoration models, LAD-Enhancer integrates detection labels as additional supervision during training to ensure that enhancement is detection-oriented rather than purely visual. Furthermore, a three-stage training strategy and a Mixture of Experts (MoE) framework are employed to adaptively classify and process images captured under different degradation conditions. Experimental results demonstrate that, with an increase of fewer than 3K parameters, the proposed LAD-Enhancer significantly improves detection performance under adverse weather conditions while maintaining almost unchanged performance on clear-weather images.

11:15am - 11:30am

A Collaborative Detection Method of Small Unmanned Aerial Vehicle Target via Multi-modal Feature Fusion in Complex Background

Wen Jiang, Keyi Zhang, Yanping Wang, Yun Lin, Fukun Bi

North China University of Technology, Beijing, People's Republic of China

Currently, the state-of-the-art methods for detecting small unmanned aerial vehicles (UAVs) continue to struggle in complex urban settings due to several persistent challenges, namely, frequent target occlusion, high similarity in thermal radiation signatures between UAVs and their surroundings, and the inherently low visual saliency of small UAV targets, all of which contribute to degraded detection performance. To tackle these issues, this paper introduces a novel multi-modal feature fusion collaborative detection (MFFCD) framework grounded in learnable spatial mapping. The architecture consists of three key components: firstly, a multi-branch parallel feature extraction module (MBPFE) that simultaneously processes infrared, visible, and radar range-azimuth images, complemented by a feature fusion module (FFM) designed to enhance both intra-modal and inter-modal feature interactions; then, an adaptive spatially-aware dynamic detection head module (DDH) that dynamically recalibrates feature weights to strengthen target representation and boost detection accuracy; and a feature collaborative enhancement module (FCE) that employs a learnable affine transformation to align and fuse multi-modal features, thereby producing more robust and reliable detection outcomes. Extensive experiments show that the proposed MFFCD framework substantially outperforms existing methods under challenging urban conditions, achieving a 56.89% gain in Mean Average Precision (mAP) for small UAV detection.

11:30am - 11:45am

Infrared-Visible Image Fusion Method Based on Differential Feature Enhancement and Cross-Modal Attention

Huang Zhang¹, Lina Xu¹, Qing Zhou¹, Tiyou Zhou², Siyu Liu¹, Xincai Chang¹, Hao LI¹

¹Hubei Subsurface Multi-scale Imaging Key Laboratory, School of Geophysics and Geomatics, China University of Geosciences, Wuhan, 430074, China; ²State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, 430079, China

Infrared and visible remote sensing image fusion is crucial for improving scene perception in complex environments, but existing autoencoder-based methods suffer from insufficient information interaction between modalities, inadequate deep feature fusion, and ineffective loss functions in extreme scenarios. To address these issues, this study proposes a Differential Feature Enhancement and Cross-modal Fusion (DFECF) method. The DFECF adopts an end-to-end architecture consisting of dual-stream encoders, cross-modal fusion modules, Transformer global perception modules, and decoders. Specifically, the Differential Enhancement (DE) module extracts differential information between infrared and visible features, combined with spatial and channel attention to enhance feature representation. The cross-modal fusion module adaptively integrates deep features based on channel attention, adjusting feature weights according to scene characteristics. The Transformer module supplements the global receptive field to capture long-range feature dependencies, and a joint loss function is designed to optimize fusion performance. Experimental results on public datasets show that the proposed method outperforms existing state-of-the-art methods in both subjective visual effects and objective evaluation metrics, especially in extreme environments such as strong light and thick smoke. It effectively improves the integrity of scene perception and provides high-quality data support for practical applications such as forest fire prevention, mining area monitoring, and autonomous driving.