JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at isprs2026@icsevents.com.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Agenda Overview

Session

WG II/1A: Image Orientation and Fusion

Time:

Thursday, 09-July-2026:

3:30pm - 5:15pm

Location: 714B

175 theatre

Session Topics:

Image Orientation and Fusion (WG II/1)

External Resource: http://www.commission2.isprs.org/wg1

Presentations

3:30pm - 3:45pm

AI-based Camera Pose Estimation on mixed Aerial and Ground Images: A comparative Study

Zichao Zeng, June Moh Goo, Jan Boehm

University College London, United Kingdom

Estimating camera poses jointly from aerial and ground imagery remains difficult because large viewpoint changes reduce overlap, alter appearance, and weaken the geometric assumptions relied on by both classical photogrammetry and recent AI-based reconstruction models. This paper presents a controlled comparison between a classic photogrammetric approach represented by COLMAP and a cross-view fine-tuned end-to-end model based on Dust3R. Tests are carried out on a London building scene containing 10 aerial and 29 ground images. Fine-tuned Dust3R reconstructs the full image set, whereas COLMAP successfully registers 24 ground-level images. Because both reconstructions are defined only up to an unknown similarity transform and no ground-truth poses are available, we evaluate the shared subset through 7-DoF similarity transformation analysis rather than direct metric pose errors. After transformation, the translation RMSE of the shared camera centres is 10.0\% of the reconstructed scene diagonal in the fine-tuned Dust3R coordinate frame. We further compare pairwise geometric support using a unified fundamental-matrix RANSAC evaluation over 406 image pairs. The AI-based pipeline achieves substantially higher inlier ratios than photogrammetric pipeline under the same verification settings, indicating more successful cross-view orientation. The study contributes a clearer evaluation protocol for mixed aerial-ground pose estimation without ground truth, together with an empirical analysis of robustness, alignment behaviour, and current limitations of both pipelines.

3:45pm - 4:00pm

Epipolar Rectification of a Generic Camera

Marc Pierrot Deseilligny, Ewelina Rupnik

Univ Gustave Eiffel, Géodata Paris, IGN, LASTIG

We propose a generic method for epipolar resampling that is not tied to a specific camera model. We demonstrate the effectiveness of the approach on a central perspective, pushbroom and pushbroom panoramic camera models. We also devise an \textit{epipolarability index} that measures the suitability of an image pair for epipolar rectification, and provide a formal derivation of the ambiguity bound to epipolar resampling. An open-source implementation of the algorithm is available at github.com/micmacIGN/micmac

4:00pm - 4:15pm

ThermalAssist: Towards Efficient Annotation of Thermal Imagery

Jingwei Zhu^1,2, Manoj Biswanath^1,3, Benjamin Busam^1,3

¹Chair of Photogrammetry and Remote Sensing, Technical University of Munich, Germany; ²School of Geospatial and Artificial Intelligence, East China Normal University, China; ³Munich Center for Machine Learning (MCML), Munich, Germany

Thermal infrared (TIR) imaging provides surface temperature of the objects and reveals heat-transfer patterns of buildings, which supports applications such as insulation inspection, energy leakage, and thermal bridge detection. However, the TIR image dataset with reliable annotations for deep learning remains scarce, as the labeling process is time-consuming and tedious, and particularly challenging due to the low-texture and blurred features of TIR images. To address this challenge, we propose ThermalAssist, a geometry and gradient-aware framework designed to assist thermal anomaly labeling in TIR imagery. By combining sparse manual annotations with dense correspondence via flow-based propagation, the framework efficiently transfers labels across image sequences while preserving semantic consistency and boundary integrity. Experiments on the TBBR dataset demonstrate that ThermalAssist can transfer labels between images, achieving up to 21% higher F1-score and 35% higher precision compared to state-of-the-art tracking-based baselines. It also helps identify missing annotations and boundary inconsistencies for quality checks. This work provides a foundational tool for quality-assured thermal annotation pipelines and represents a key step toward more scalable, reliable, and intelligent labeling of thermal imagery.

4:15pm - 4:30pm

Evaluation of recent AI-based point matching algorithms applied on aerial images

Pablo d'Angelo, Franz Kurz, Alaa Eddine Ben Zekri, Reza Bahmanyar

German Aerospace Center, Germany

Accurate image matching is essential for the precise orientation of airborne imagery, yet modern feature matchers are rarely evaluated on real aerial data with great temporal, seasonal, and radiometric changes. For this study, we introduce the AerialRefMatch dataset, which comprises 51 challenging aerial images and corresponding true-ortho reference data. We benchmark classical and deep learning–based matching algorithms on AerialRefMatch, considering two scenarios: matching original images and matching approx-orthorectified images generated using GNSS/IMU orientations. For each method, image-based ground control points are derived and used for single-image pose estimation; accuracy is assessed via independent checkpoints. Results show that directly matching on original images is very difficult: fewer than 14\% of images can be oriented with pixel-level accuracy. When approx-orthorectification is used, performance improves substantially. JamMa, SIFT, and SuperPoint+LightGlue achieve pixel-level accuracy for up to 30\% of images, with JamMa being most robust on difficult cases and SIFT-based variants being more precise on the easier ones. Deep detector-free models such as ELoFTR and RoMa are less accurate but more robust to the original images than other models. Overall, state-of-the-art deep learning-based matchers still struggle with large rotations, scale differences, and semantic differences, and strongly benefit from prior image orientation knowledge and lack sub-pixel precision.

4:30pm - 4:45pm

Faster than Light: An Embedded-Efficient Matching Model with ReLU Linear Attention

Ziang Wang¹, Tao He¹, Wei Cui¹, Yu Duan¹, Kaimin Sun¹, Haoyun Miao²

¹State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan China; ²North Automatic Control Technology Institute. Taiyuan, China

Deep learning-based image matching faces a critical challenge when deployed on computationally constrained embedded aerial devices. Transformer-based architectures, particularly the scaled dot-product attention mechanism, incur high computational costs that limit inference speed for real-time applications. To address this bottleneck, we propose FastGlue, a sparse feature matching algorithm that adapts the LightGlue architecture through two targeted modifications: replacing the scaled dot-product attention with a ReLU-based linear attention module, and reducing the depth of the graph neural network. These changes reduce computational complexity while maintaining competitive matching performance. Evaluations on HPatches and MegaDepth-1500 benchmarks show that FastGlue achieves accuracy comparable to LightGlue while improving inference speed—from 20.05 ms to 17.05 ms on GPU, and from 840.45 ms to 665.85 ms on an RK3588 embedded CPU. Our work demonstrates that targeted architectural simplifications can yield meaningful efficiency gains for deep learning-based feature matching on resource-constrained platforms.

4:45pm - 5:00pm

SCOP: An Open-Source and Educational JAX-Powered Framework for Generic Photogrammetric Bundle Adjustment

Adrien Gressin

University of Applied Sciences Western Switzerland (HES-SO / HEIG-VD)

We present SCOP, an open-source and educational framework for generic photogrammetric bundle adjustment built in Python and powered by JAX automatic differentiation. SCOP removes the need for manual Jacobian derivation by expressing all projection models as pure mathematical functions with automatically computed exact derivatives.

The framework supports multiple camera geometries (pinhole, fisheye, equirectangular) and optimization methods (Gauss-Newton, Gauss-Newton-Armijo, Levenberg-Marquardt, Gradient Descent). Its modular architecture, separating cameras, images, and observations, allows easy extension to new sensors and constraint types, including GNSS positions, ground control points, and geodetic observations.

A hybrid computation pipeline combines JAX for differentiation with a Rust backend for sparse Schur complement elimination, achieving ~0.5 s per iteration on a real-world dataset with 79k unknowns and 181k observations. Following classical least-squares photogrammetry, SCOP provides rigorous uncertainty estimation through covariance matrices, normalized residuals, and reliability indices. With synthetic data tools and interactive 3D visualization, it enables transparent teaching and reproducible research.

5:00pm - 5:15pm

TriCo-Net: Learning Semantically Aware Local Features via Triple Consistency

Longze Zhu^1,2, Li Yan^1,2, Hong Xie^1,2, Hao Wu^1,2, Shan Su^1,2, Binbing Wang¹, Xiaoteng Yang¹, Junjie Yuan¹, Aoran Li³

¹Wuhan University, The School of Geodesy and Geomatics, Wuhan 430079, Hubei, China; ²Hubei Luojia Laboratory, Wuhan 430079, Hubei, China; ³Henan Normal University, The College of Software, Xinxiang 453000, Henan, China

Local feature matching in complex scenes is hindered by semantic ambiguity, where detectors often latch onto transient or repetitive patterns. We present TriCo-Net, which learns semantically aware and discriminative local features by enforcing a Triple Consistency (TriCo) principle across implicit semantics, scale, and spatial context. During training, an Implicit Semantic Strategy (ISS) distills cues from a segmentation teacher to modulate keypoint reliability and descriptor learning, while introducing no overhead at inference. A Scale-wise Semantic Harmonizer (SSH) aligns and fuses feature-pyramid levels to ensure cross-scale coherence, and a Global Context Propagator (GCP) broadcasts scene-level dependencies to resolve local ambiguities. On Aachen Day–Night v1.1, TriCo-Net achieves strong and consistent gains in visual localization, particularly under night conditions, and exhibits robustness to blur, noise, and large homographies. Ablations show complementary benefits from ISS, SSH, and GCP, with ISS contributing most at tight thresholds and at night. TriCo-Net narrows the day–night performance gap while maintaining mid-range throughput, offering a practical trade-off between robustness and efficiency.