JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at isprs2026@icsevents.com.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Daily Overview

Session

WG II/9B: Vision Metrology

Time:

Tuesday, 07-July-2026:

1:30pm - 3:00pm

Location: 713B

125 theatre

Session Topics:

Vision Metrology (WG II/9)

External Resource: http://www.commission2.isprs.org/wg9

Presentations

1:30pm - 1:45pm

Quantization-Aware Training for Efficient Object Detection on FPGAs: Case Studies

Xuanshu Luo, Gabor Fogarasi, Alan Syrgak, Paul Walther, Martin Werner

Technical University of Munich, Germany

Deploying object detection models for resource-constrained remote sensing applications necessitates on-board model inference capabilities. While Field Programmable Gate Arrays (FPGAs) offer massive parallelism as energy-efficient hardware platforms, model quantization remains essential to further balance computational efficiency with detection accuracy. Compared to post-training quantization methods that involve multiple-stage development with consistent dependency on domain datasets, quantization-aware training (QAT) integrates quantization constraints into training, providing a simpler pipeline for model compression. However, QAT introduces quantization errors to which smaller objects are more vulnerable. To address this issue, we propose object-scale-aware (OSA) regularization that amplifies quantization error penalties for smaller targets. Our approach is validated through two case studies: bird detection at airports and aerial-view building detection. We perform 8-bit QAT on YOLOX series models using the MVA2023 dataset and the Bavarian Building Dataset for the respective studies. Our method achieves up to 50.2 times inference acceleration with minimal accuracy loss on Xilinx Kria KV260 FPGAs compared to full-precision models. The ablation study and detection examples further demonstrate the effectiveness of OSA regularization in small object detection.

1:45pm - 2:00pm

Evaluation of Visual Place Recognition Methods for Image Pair Retrieval in 3D Vision and Robotics

Dennis Haitz¹, Athradi Shritish Shetty¹, Michael Weinmann², Markus Ulrich¹

¹Karlsruhe Institute of Technology, Germany; ²Delft University of Technology, Netherlands

A broad evaluation of state-of-the-art Visual Place Recognition methods is presented. The evaluation focuses on tasks where a fast image pair retrieval is of high importance, such as image-driven scene registration, SLAM or Structure-from-Motion correspondence search. This implies, that the focus of the study is geared away from typical Visual Place Recognition and towards scenarios of interest in computer vision and robotics. A sophisticated evaluation pipeline for retrieval and runtime performance is presented. Prepared datasets based on widely used benchmarks from different domains are utilized, such as indoor-SLAM, outdoor object-centric as well as autonomous navigation in urban and sub-urban areas.

2:00pm - 2:15pm

MVM-IOD: An Industrial Object-Centric Benchmark Dataset for the Evaluation of 3D Reconstruction Methods

Robert Langendörfer, Markus Hillemann, Markus Ulrich

KIT, Germany

3D object reconstruction, camera pose estimation, and novel view synthesis in industrial applications are challenging tasks, as errors are costly while the timewindow for solving these tasks is often limited. The complexity of typical industrial objects further complicates these tasks. Different datasets that can be used to evaluate current methods on these tasks exist, however, most of them do not depict realistic industrial scenarios. We introduce the Machine Vision Metrology Industrial Object Dataset (MVM-IOD)

that addresses this lack of datasets. The hardware setup to acquire the dataset consists of a camera, mounted upside down due to space restrictions, at the end effector of an industrial robot arm. Images of typical industrial objects are captured systematically, by moving the camera on a hemisphere around the objects. MVM-IOD contains the camera poses, the acquired RGB images, and the 3D point cloud of 9 objects and 2 background choices resulting in 18 scenes, which allows evaluation of all image based

methods that compute a 3D reconstruction, camera poses, and/or novel views. Based on our dataset, we extensively evaluate current state-of-the-art 3D reconstruction and camera pose estimation methods, such as Structure from Motion, Multi-View Stereo, Visual Geometry Grounded Transformer (VGGT), π3, as well as 2D Gaussian Splatting and report our findings to create a baseline for future research.

2:15pm - 2:30pm

A Critical Synthesis of Uncertainty Quantification and Foundation Models for Semantic Segmentation

Steven Landgraf, Joceline Hinz, Markus Ulrich

Karlsruhe Institute of Technology, Germany

Foundation models are increasingly breaking what seemed to be impossible not long ago by enabling unprecedented accuracy and

cross-domain generalization. Yet their lack of interpretability, tendency to be overconfident, and sensitivity to real-world domain

shifts pose critical challenges for safety- and mission-critical applications. Uncertainty quantification (UQ) offers a principled way

to address these issues, but its integration into segmentation foundation models has yet to be explored. In this paper we present the

first systematic evaluation of UQ methods applied to a foundation model for semantic segmentation. We fine-tune a lightweight

DPT decoder on top of the pretrained SAM2 encoder to establish a simple yet competitive baseline and benchmark four representative

UQ approaches – Monte Carlo Dropout, Deep Sub-Ensemble, Test-Time Augmentation, and Evidential Deep Learning – across

Cityscapes, NYUv2, and two challenging out-of-domain settings. Our analysis compares segmentation accuracy, calibration, uncertainty

quality, and inference time, revealing clear trade-offs between predictive performance, reliability, and computational cost.

These results highlight both the promise and the current limitations of uncertainty-aware foundation models, pointing to the need

for future work that jointly optimizes accuracy, robustness, and efficiency for real-world deployment.

2:30pm - 2:45pm

The Impact of CutMix on Reliability and Robustness in Semantic Segmentation

Steven Landgraf, Markus Ulrich

Karlsruhe Institute of Technology, Germany

Ensuring not only high accuracy but also reliable and robust predictions is critical for the deployment of semantic segmentation

models in safety-critical applications such as autonomous driving. Despite the widespread use of CutMix – a simple yet powerful

data augmentation strategy – its effect on the reliability and robustness in dense predictions tasks remains unexplored. Motivated

by recent findings that semi-supervised segmentation methods, where CutMix is a core component, can severely degrade reliability,

this study isolates and systematically analyzes the influence of CutMix on segmentation accuracy, calibration, and uncertainty

quality. We evaluate two representative architectures, the CNN-based DeepLabV3+ and the transformer-based SegFormer, across

both in-domain and out-of-domain scenarios. Our results show that CutMix has only a minor impact on segmentation accuracy

but consistently improves the reliability, particularly under distribution shifts. These improvements indicate that CutMix primarily

enhances the trustworthiness of the model’s calibration and uncertainty rather than the raw segmentation prediction itself. This

distinction is crucial for safety-critical deployment, where reliable confidence estimates are as important as raw performance.

2:45pm - 3:00pm

Uncertainty Quality of VGGT: An Analysis on the DTU Benchmark Dataset

Markus Hillemann, Robert Langendörfer, Steven Landgraf, Markus Ulrich

Karlsruhe Institute of Technology, Germany

Visual Geometry Grounded Transformer (VGGT) has already attracted a great deal of attention in a short period of time, not least due to the Best Paper Award at CVPR-2025. Similar to DUSt3R and MASt3R, VGGT aims to bring about a paradigm shift by replacing established methods like bundle adjustment and feature matching with a simple, unified, feed-forward neural network that predicts camera poses, depth maps, and dense 3D structure directly from multiple images of a scene in a few seconds. A key aspect is its ability to process an arbitrary number of views consistently in a single forward pass without any post-processing or iterative optimization. For photogrammetry, this opens new possibilities for real-time, scalable, and accessible 3D reconstruction. In this context, not only high reconstruction accuracy but also high-quality uncertainty estimates are crucial, as they foster trust and enable robust quality assurance. This paper therefore investigates the quality of VGGT’s uncertainty predictions. The analysis identifies an effective confidence threshold for filtering VGGT’s raw output and demonstrates that enhancing uncertainty quality holds strong potential for improving the accuracy of its 3D reconstructions.