JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at isprs2026@icsevents.com.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Daily Overview

Session

WG II/3B: 3D Scene Reconstruction for Modeling & Mapping

Time:

Monday, 06-July-2026:

3:30pm - 5:15pm

Location: 715B

125 theatre

Session Topics:

3D Scene Reconstruction for Modeling & Mapping (WG II/3)

External Resource: http://www.commission2.isprs.org/wg3

Presentations

3:30pm - 3:45pm

3D gaussian splatting for large-scale 3D reconstruction: an evaluation and quality analysis

Jiangxue Yu¹, Yueling Liao¹, San Jiang^2,3, Xing Zhang^2,3, Zhijun Wang⁴, Qingquan Li^2,3

¹School of Computer Science, China University of Geosciences, Wuhan 430074, China; ²Guangdong Key Laboratory of Urban Informatics, Shenzhen University, Guangdong Shenzhen, 518060, China; ³MNR Key Laboratory for Geo-Environmental Monitoring of Great Bay Area, Shenzhen University, Guangdong Shenzhen, 518060, China; ⁴Guangdong Laboratory of Artificial Intelligence and Digital Economy (Shenzhen), Guangdong Shenzhen, 518060, China

Large-scale 3D reconstruction has emerged as a key research in the fields of photogrammetry and computer vision. 3D Gaussian Splatting (3DGS) has become a mainstream approach due to its efficient rendering, but it confronts critical challenges in large-scale scenarios: excessive memory overhead and inadequate geometric accuracy. Meanwhile, the traditional Structure from Motion and Multi-view Stereo (SfM-MVS) framework, despite its cumbersome process, continues to exhibit robust performance. Notably, a systematic evaluation comparing these two paradigms in large-scale scenes remains absent. To address this, we develop a unified verification framework to evaluate the texture rendering quality and geometric reconstruction precision of several recent methods using real-world datasets. The results indicate that SfM-MVS methods still maintain an advantage in the completeness and accuracy of geometric reconstruction. In contrast, 3DGS methods have achieved breakthroughs in local accuracy or rendering-geometry synergy, yet their global consistency requires further improvement.

3:45pm - 4:00pm

RobustGauss: Robust 3D gaussian splatting for distractor-free 3D scene reconstruction

Haibing Liu¹, Shihan Chen¹, Huchen Li¹, Wubiao Huang¹, Shuai Zhang¹, Fei Deng^1,2

¹School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China; ²Hubei Luojia Laboratory, Wuhan 430079, China

3DGS-based methods often render transient distractors in 3D scenes as significant floating artifacts. Existing works for removing transient distractors suffer from under-identification or over-identification, resulting in residual transient distractors affecting reconstruction quality or loss of scene information, preventing the reconstruction of fine details. To address these challenges, we propose RobustGauss. We first rely solely on the cosine similarity of DINOv2 features to robustly predict uncertainty masks and accurately identify the main regions of transient disturbances and their corresponding shadows. Due to the limited resolution of DINOv2 features, we use high-resolution image residuals to refine the edges of the initial uncertainty masks, thereby accurately identifying all transient distractors and minimizing their impact on 3D scene reconstruction. Experiments on two challenging datasets demonstrate that our method achieves state-of-the-art performance.

4:00pm - 4:15pm

BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model

Yuci Han¹, Charles Toth¹, John Anderson², William Shuart², Alper Yilmaz¹

¹the ohio state university, United States of America; ²USACE ERDC GRL

N/A

4:15pm - 4:30pm

EMVSNet: Evidential multi-view stereo reconstruction for sampling-free depth and uncertainty estimation

Christian Grannemann, Max Mehltretter

Leibniz University Hannover, Germany

We present EMVSNet, a sampling-free Multi-View Stereo (MVS) method that, to the best of our knowledge, is the first to integrate Evidential Deep Learning into MVS. Given a set of overlapping images, our method predicts a depth value together with its associated uncertainty per pixel of a reference image, incorporating uncertainty from aleatoric and epistemic sources. Specifically, we use an existing convolutional neural network architecture designed for MVS as backbone and extend it to regress evidential parameters per pixel, describing the probability distribution over the depth corresponding to this pixel. In contrast to existing MVS methods that often neglect epistemic uncertainty or obtain it via sampling at inference, our evidential formulation does not require sampling, but enables single-pass inference. We evaluate the uncertainty estimation capabilities of our method using two publicly available datasets and compare the depth predictions against a deterministic variant. The experimental results demonstrate that EMVSNet achieves competitive depth accuracy while, at the same time, providing uncertainty estimates that enable us to reliably rank depth estimates according to their risk of being incorrect and to automatically identify out of distribution data. Our model shows only slightly increased inference time compared to a deterministic baseline while giving comparable uncertainty estimates to an computationally expensive sampling based approach, marking a first step towards real-time capable uncertainty estimation for image-based 3D reconstruction.

4:30pm - 4:45pm

Adaptive Scaling with Geometric and Visual Continuity of completed 3D objects

Jelle Vermandere, Maarten Bassier, Maarten Vergauwen

KU Leuven, Belgium

Object completion networks typically produce static Signed Distance Fields (SDFs) that faithfully reconstruct geometry but cannot be rescaled or deformed without introducing structural distortions. This limitation restricts their use in applications requiring flexible object manipulation, such as indoor redesign, simulation, and digital content creation. We introduce a part-aware scaling framework that transforms these static completed SDFs into editable, structurally coherent objects. Starting from SDFs and Texture Fields generated by state-of-the-art completion models, our method performs automatic part segmentation, defines user-controlled scaling zones, and applies smooth interpolation of SDFs, color, and part indices to enable proportional and artifact-free deformation. We further incorporate a repetition-based strategy to handle large-scale deformations while preserving repeating geometric patterns. Experiments on Matterport3D and ShapeNet objects show that our method overcomes the inherent rigidity of completed SDFs and is visually more appealing than global and naive selective scaling, particularly for complex shapes and repetitive structures.

4:45pm - 5:00pm

MambaPanoptic: a Vision Mamba-based Structured State Space Framework for panoptic Segmentation

Qing Cheng^1,2, Damiano Bertolini^1,3, Wei Zhang⁴, Dong Wang⁵, Niclas Zeller⁶, Daniel Cremers^1,2

¹Technical University of Munich, Germany; ²Munich Center for Machine Learning; ³Polytechnic University of Milan; ⁴University of Stuttgart; ⁵Wuhan University; ⁶Karlsruhe University of Applied Sciences

Panoptic segmentation requires the simultaneous recognition of countable thing instances and amorphous stuff regions, placing joint demands on long-range context modelling, multi-scale feature representation, and efficient dense prediction. Existing convolutional and transformer-based methods struggle to satisfy all three requirements concurrently: convolutional architectures are limited in their capacity to model long-range dependencies, while transformer-based methods incur quadratic computational cost that is prohibitive at high resolutions. In this paper, we propose MambaPanoptic, a fully Mamba-based panoptic segmentation framework that addresses these limitations through two principal contributions. First, we introduce MambaFPN, a top-down feature pyramid that leverages Mamba blocks to generate globally coherent, multi-scale feature representations with linear computational complexity. Second, we adopt a PanopticFCN-style kernel generator that produces unified thing and stuff kernels for proposal-free panoptic prediction, enhanced by a QuadMamba-based feature refinement module applied at multiple network stages. Experiments on the Cityscapes and COCO panoptic segmentation benchmarks demonstrate that MambaPanoptic consistently outperforms PanopticDeepLab and PanopticFCN under comparable model sizes, and matches or surpasses Mask2Former on Cityscapes in PQ and AP while requiring fewer parameters.

5:00pm - 5:15pm

GeoPrior-Diff: Using Stable Diffusion as a geometric Prior for single-view 3D Point Cloud Reconstruction

Youssef Korny¹, Sunghwan Yoo¹, Mohammad Moein Sheikholeslami¹, Daniel Panangian², Ksenia Bittner², Andreas Wichmann³, Gunho Sohn¹

¹Dept. of Earth and Space Science and Engineering, York University, Canada; ²Remote Sensing Technology Institute, German Aerospace Center (DLR), Germany; ³Institute for Applied Photogrammetry and Geoinformatics (IAPG), Jade University of Applied Sciences, Germany

Single-view 3D reconstruction from monocular aerial imagery presents a fundamental challenge in remote sensing due to the inherent scale ambiguity and the complex geometry of urban environments. Traditional regression-based methods often struggle to recover high-frequency structural details, leading to over-smoothed or noisy outputs. To address this, we introduce GeoPrior-Diff, a novel two-stage framework that leverages the generative capabilities of Latent Diffusion Models to reconstruct high-fidelity 3D point clouds.

Unlike direct generation approaches, our method explicitly bridges the domain gap between 2D texture and 3D structure by utilizing an intermediate geometric prior. In the first stage, we predict an oblique normal map from the input satellite imagery, capturing essential surface orientation and structural boundaries. In the second stage, this normal map serves as a strong conditioning signal for a probabilistic diffusion model, guiding the denoising process to synthesize accurate 3D point clouds. Preliminary results demonstrate that decoupling geometric estimation from point generation significantly enhances structural consistency and reduces artifacts compared to baseline methods. This work highlights the potential of using generative priors for robust 3D urban modeling from limited data.