Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Daily Overview |
| Session | ||
WG II/9B: Vision Metrology
Session Topics: Vision Metrology (WG II/9)
| ||
| External Resource: http://www.commission2.isprs.org/wg9 | ||
| Presentations | ||
1:30pm - 1:45pm
Quantization-Aware Training for Efficient Object Detection on FPGAs: Case Studies Technical University of Munich, Germany Deploying object detection models for resource-constrained remote sensing applications necessitates on-board model inference capabilities. While Field Programmable Gate Arrays (FPGAs) offer massive parallelism as energy-efficient hardware platforms, model quantization remains essential to further balance computational efficiency with detection accuracy. Compared to post-training quantization methods that involve multiple-stage development with consistent dependency on domain datasets, quantization-aware training (QAT) integrates quantization constraints into training, providing a simpler pipeline for model compression. However, QAT introduces quantization errors to which smaller objects are more vulnerable. To address this issue, we propose object-scale-aware (OSA) regularization that amplifies quantization error penalties for smaller targets. Our approach is validated through two case studies: bird detection at airports and aerial-view building detection. We perform 8-bit QAT on YOLOX series models using the MVA2023 dataset and the Bavarian Building Dataset for the respective studies. Our method achieves up to 50.2 times inference acceleration with minimal accuracy loss on Xilinx Kria KV260 FPGAs compared to full-precision models. The ablation study and detection examples further demonstrate the effectiveness of OSA regularization in small object detection. 1:45pm - 2:00pm
Evaluation of Visual Place Recognition Methods for Image Pair Retrieval in 3D Vision and Robotics 1Karlsruhe Institute of Technology, Germany; 2Delft University of Technology, Netherlands A broad evaluation of state-of-the-art Visual Place Recognition methods is presented. The evaluation focuses on tasks where a fast image pair retrieval is of high importance, such as image-driven scene registration, SLAM or Structure-from-Motion correspondence search. This implies, that the focus of the study is geared away from typical Visual Place Recognition and towards scenarios of interest in computer vision and robotics. A sophisticated evaluation pipeline for retrieval and runtime performance is presented. Prepared datasets based on widely used benchmarks from different domains are utilized, such as indoor-SLAM, outdoor object-centric as well as autonomous navigation in urban and sub-urban areas. 2:00pm - 2:15pm
MVM-IOD: An Industrial Object-Centric Benchmark Dataset for the Evaluation of 3D Reconstruction Methods KIT, Germany 3D object reconstruction, camera pose estimation, and novel view synthesis in industrial applications are challenging tasks, as errors are costly while the timewindow for solving these tasks is often limited. The complexity of typical industrial objects further complicates these tasks. Different datasets that can be used to evaluate current methods on these tasks exist, however, most of them do not depict realistic industrial scenarios. We introduce the Machine Vision Metrology Industrial Object Dataset (MVM-IOD) that addresses this lack of datasets. The hardware setup to acquire the dataset consists of a camera, mounted upside down due to space restrictions, at the end effector of an industrial robot arm. Images of typical industrial objects are captured systematically, by moving the camera on a hemisphere around the objects. MVM-IOD contains the camera poses, the acquired RGB images, and the 3D point cloud of 9 objects and 2 background choices resulting in 18 scenes, which allows evaluation of all image based methods that compute a 3D reconstruction, camera poses, and/or novel views. Based on our dataset, we extensively evaluate current state-of-the-art 3D reconstruction and camera pose estimation methods, such as Structure from Motion, Multi-View Stereo, Visual Geometry Grounded Transformer (VGGT), π3, as well as 2D Gaussian Splatting and report our findings to create a baseline for future research. 2:15pm - 2:30pm
A Critical Synthesis of Uncertainty Quantification and Foundation Models for Semantic Segmentation Karlsruhe Institute of Technology, Germany Foundation models are increasingly breaking what seemed to be impossible not long ago by enabling unprecedented accuracy and cross-domain generalization. Yet their lack of interpretability, tendency to be overconfident, and sensitivity to real-world domain shifts pose critical challenges for safety- and mission-critical applications. Uncertainty quantification (UQ) offers a principled way to address these issues, but its integration into segmentation foundation models has yet to be explored. In this paper we present the first systematic evaluation of UQ methods applied to a foundation model for semantic segmentation. We fine-tune a lightweight DPT decoder on top of the pretrained SAM2 encoder to establish a simple yet competitive baseline and benchmark four representative UQ approaches – Monte Carlo Dropout, Deep Sub-Ensemble, Test-Time Augmentation, and Evidential Deep Learning – across Cityscapes, NYUv2, and two challenging out-of-domain settings. Our analysis compares segmentation accuracy, calibration, uncertainty quality, and inference time, revealing clear trade-offs between predictive performance, reliability, and computational cost. These results highlight both the promise and the current limitations of uncertainty-aware foundation models, pointing to the need for future work that jointly optimizes accuracy, robustness, and efficiency for real-world deployment. 2:30pm - 2:45pm
The Impact of CutMix on Reliability and Robustness in Semantic Segmentation Karlsruhe Institute of Technology, Germany Ensuring not only high accuracy but also reliable and robust predictions is critical for the deployment of semantic segmentation models in safety-critical applications such as autonomous driving. Despite the widespread use of CutMix – a simple yet powerful data augmentation strategy – its effect on the reliability and robustness in dense predictions tasks remains unexplored. Motivated by recent findings that semi-supervised segmentation methods, where CutMix is a core component, can severely degrade reliability, this study isolates and systematically analyzes the influence of CutMix on segmentation accuracy, calibration, and uncertainty quality. We evaluate two representative architectures, the CNN-based DeepLabV3+ and the transformer-based SegFormer, across both in-domain and out-of-domain scenarios. Our results show that CutMix has only a minor impact on segmentation accuracy but consistently improves the reliability, particularly under distribution shifts. These improvements indicate that CutMix primarily enhances the trustworthiness of the model’s calibration and uncertainty rather than the raw segmentation prediction itself. This distinction is crucial for safety-critical deployment, where reliable confidence estimates are as important as raw performance. 2:45pm - 3:00pm
Uncertainty Quality of VGGT: An Analysis on the DTU Benchmark Dataset Karlsruhe Institute of Technology, Germany Visual Geometry Grounded Transformer (VGGT) has already attracted a great deal of attention in a short period of time, not least due to the Best Paper Award at CVPR-2025. Similar to DUSt3R and MASt3R, VGGT aims to bring about a paradigm shift by replacing established methods like bundle adjustment and feature matching with a simple, unified, feed-forward neural network that predicts camera poses, depth maps, and dense 3D structure directly from multiple images of a scene in a few seconds. A key aspect is its ability to process an arbitrary number of views consistently in a single forward pass without any post-processing or iterative optimization. For photogrammetry, this opens new possibilities for real-time, scalable, and accessible 3D reconstruction. In this context, not only high reconstruction accuracy but also high-quality uncertainty estimates are crucial, as they foster trust and enable robust quality assurance. This paper therefore investigates the quality of VGGT’s uncertainty predictions. The analysis identifies an effective confidence threshold for filtering VGGT’s raw output and demonstrates that enhancing uncertainty quality holds strong potential for improving the accuracy of its 3D reconstructions. | ||

