Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Daily Overview |
| Session | ||
WG II/3B: 3D Scene Reconstruction for Modeling & Mapping
Session Topics: 3D Scene Reconstruction for Modeling & Mapping (WG II/3)
| ||
| External Resource: http://www.commission2.isprs.org/wg3 | ||
| Presentations | ||
3:30pm - 3:45pm
3D gaussian splatting for large-scale 3D reconstruction: an evaluation and quality analysis 1School of Computer Science, China University of Geosciences, Wuhan 430074, China; 2Guangdong Key Laboratory of Urban Informatics, Shenzhen University, Guangdong Shenzhen, 518060, China; 3MNR Key Laboratory for Geo-Environmental Monitoring of Great Bay Area, Shenzhen University, Guangdong Shenzhen, 518060, China; 4Guangdong Laboratory of Artificial Intelligence and Digital Economy (Shenzhen), Guangdong Shenzhen, 518060, China Large-scale 3D reconstruction has emerged as a key research in the fields of photogrammetry and computer vision. 3D Gaussian Splatting (3DGS) has become a mainstream approach due to its efficient rendering, but it confronts critical challenges in large-scale scenarios: excessive memory overhead and inadequate geometric accuracy. Meanwhile, the traditional Structure from Motion and Multi-view Stereo (SfM-MVS) framework, despite its cumbersome process, continues to exhibit robust performance. Notably, a systematic evaluation comparing these two paradigms in large-scale scenes remains absent. To address this, we develop a unified verification framework to evaluate the texture rendering quality and geometric reconstruction precision of several recent methods using real-world datasets. The results indicate that SfM-MVS methods still maintain an advantage in the completeness and accuracy of geometric reconstruction. In contrast, 3DGS methods have achieved breakthroughs in local accuracy or rendering-geometry synergy, yet their global consistency requires further improvement. 3:45pm - 4:00pm
RobustGauss: Robust 3D gaussian splatting for distractor-free 3D scene reconstruction 1School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China; 2Hubei Luojia Laboratory, Wuhan 430079, China 3DGS-based methods often render transient distractors in 3D scenes as significant floating artifacts. Existing works for removing transient distractors suffer from under-identification or over-identification, resulting in residual transient distractors affecting reconstruction quality or loss of scene information, preventing the reconstruction of fine details. To address these challenges, we propose RobustGauss. We first rely solely on the cosine similarity of DINOv2 features to robustly predict uncertainty masks and accurately identify the main regions of transient disturbances and their corresponding shadows. Due to the limited resolution of DINOv2 features, we use high-resolution image residuals to refine the edges of the initial uncertainty masks, thereby accurately identifying all transient distractors and minimizing their impact on 3D scene reconstruction. Experiments on two challenging datasets demonstrate that our method achieves state-of-the-art performance. 4:00pm - 4:15pm
BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model 1the ohio state university, United States of America; 2USACE ERDC GRL N/A 4:15pm - 4:30pm
EMVSNet: Evidential multi-view stereo reconstruction for sampling-free depth and uncertainty estimation Leibniz University Hannover, Germany We present EMVSNet, a sampling-free Multi-View Stereo (MVS) method that, to the best of our knowledge, is the first to integrate Evidential Deep Learning into MVS. Given a set of overlapping images, our method predicts a depth value together with its associated uncertainty per pixel of a reference image, incorporating uncertainty from aleatoric and epistemic sources. Specifically, we use an existing convolutional neural network architecture designed for MVS as backbone and extend it to regress evidential parameters per pixel, describing the probability distribution over the depth corresponding to this pixel. In contrast to existing MVS methods that often neglect epistemic uncertainty or obtain it via sampling at inference, our evidential formulation does not require sampling, but enables single-pass inference. We evaluate the uncertainty estimation capabilities of our method using two publicly available datasets and compare the depth predictions against a deterministic variant. The experimental results demonstrate that EMVSNet achieves competitive depth accuracy while, at the same time, providing uncertainty estimates that enable us to reliably rank depth estimates according to their risk of being incorrect and to automatically identify out of distribution data. Our model shows only slightly increased inference time compared to a deterministic baseline while giving comparable uncertainty estimates to an computationally expensive sampling based approach, marking a first step towards real-time capable uncertainty estimation for image-based 3D reconstruction. 4:30pm - 4:45pm
Adaptive Scaling with Geometric and Visual Continuity of completed 3D objects KU Leuven, Belgium Object completion networks typically produce static Signed Distance Fields (SDFs) that faithfully reconstruct geometry but cannot be rescaled or deformed without introducing structural distortions. This limitation restricts their use in applications requiring flexible object manipulation, such as indoor redesign, simulation, and digital content creation. We introduce a part-aware scaling framework that transforms these static completed SDFs into editable, structurally coherent objects. Starting from SDFs and Texture Fields generated by state-of-the-art completion models, our method performs automatic part segmentation, defines user-controlled scaling zones, and applies smooth interpolation of SDFs, color, and part indices to enable proportional and artifact-free deformation. We further incorporate a repetition-based strategy to handle large-scale deformations while preserving repeating geometric patterns. Experiments on Matterport3D and ShapeNet objects show that our method overcomes the inherent rigidity of completed SDFs and is visually more appealing than global and naive selective scaling, particularly for complex shapes and repetitive structures. 4:45pm - 5:00pm
MambaPanoptic: a Vision Mamba-based Structured State Space Framework for panoptic Segmentation 1Technical University of Munich, Germany; 2Munich Center for Machine Learning; 3Polytechnic University of Milan; 4University of Stuttgart; 5Wuhan University; 6Karlsruhe University of Applied Sciences Panoptic segmentation requires the simultaneous recognition of countable thing instances and amorphous stuff regions, placing joint demands on long-range context modelling, multi-scale feature representation, and efficient dense prediction. Existing convolutional and transformer-based methods struggle to satisfy all three requirements concurrently: convolutional architectures are limited in their capacity to model long-range dependencies, while transformer-based methods incur quadratic computational cost that is prohibitive at high resolutions. In this paper, we propose MambaPanoptic, a fully Mamba-based panoptic segmentation framework that addresses these limitations through two principal contributions. First, we introduce MambaFPN, a top-down feature pyramid that leverages Mamba blocks to generate globally coherent, multi-scale feature representations with linear computational complexity. Second, we adopt a PanopticFCN-style kernel generator that produces unified thing and stuff kernels for proposal-free panoptic prediction, enhanced by a QuadMamba-based feature refinement module applied at multiple network stages. Experiments on the Cityscapes and COCO panoptic segmentation benchmarks demonstrate that MambaPanoptic consistently outperforms PanopticDeepLab and PanopticFCN under comparable model sizes, and matches or surpasses Mask2Former on Cityscapes in PQ and AP while requiring fewer parameters. 5:00pm - 5:15pm
GeoPrior-Diff: Using Stable Diffusion as a geometric Prior for single-view 3D Point Cloud Reconstruction 1Dept. of Earth and Space Science and Engineering, York University, Canada; 2Remote Sensing Technology Institute, German Aerospace Center (DLR), Germany; 3Institute for Applied Photogrammetry and Geoinformatics (IAPG), Jade University of Applied Sciences, Germany Single-view 3D reconstruction from monocular aerial imagery presents a fundamental challenge in remote sensing due to the inherent scale ambiguity and the complex geometry of urban environments. Traditional regression-based methods often struggle to recover high-frequency structural details, leading to over-smoothed or noisy outputs. To address this, we introduce GeoPrior-Diff, a novel two-stage framework that leverages the generative capabilities of Latent Diffusion Models to reconstruct high-fidelity 3D point clouds. Unlike direct generation approaches, our method explicitly bridges the domain gap between 2D texture and 3D structure by utilizing an intermediate geometric prior. In the first stage, we predict an oblique normal map from the input satellite imagery, capturing essential surface orientation and structural boundaries. In the second stage, this normal map serves as a strong conditioning signal for a probabilistic diffusion model, guiding the denoising process to synthesize accurate 3D point clouds. Preliminary results demonstrate that decoupling geometric estimation from point generation significantly enhances structural consistency and reduces artifacts compared to baseline methods. This work highlights the potential of using generative priors for robust 3D urban modeling from limited data. | ||

