Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Agenda Overview |
| Session | ||
WG II/3C: 3D Scene Reconstruction for Modeling & Mapping
Session Topics: 3D Scene Reconstruction for Modeling & Mapping (WG II/3)
| ||
| External Resource: http://www.commission2.isprs.org/wg3 | ||
| Presentations | ||
1:30pm - 1:45pm
CityLangSplat: Integrating CityGML Semantics into 3D Language Gaussian Splatting for Urban Scene Understanding 1Technical University of Munich; 2Munich Center for Machine Learning; 3Karlsruhe Institute of Technology; 4University of Cambridge Combining visual semantics with language representations has made 3D interpretation more flexible and intuitive. Recent advances in Gaussian Splatting extend this to efficient 3D language fields supporting open-vocabulary queries. However, existing approaches show limited generalization in large urban scenes, especially for detailed building segmentation. Semantic 3D city models such as CityGML, by contrast, provide hierarchical and geometry-aligned structural semantics that complement appearance driven visual cues. We introduce CityLangSplat, which integrates CityGML semantics into 3D Language Gaussian Splatting for urban environments. CityLangSplat rasterizes CityGML into pixel-aligned semantic maps, extracts vision-language features from SAM-derived segments and CityGML regions, and compresses both sources into a shared latent space via a lightweight autoencoder. 3D Gaussians are then optimized with a coverage-aware loss that balances accurate, building-focused CityGML supervision with broader SAM supervision, enabling geometry-aligned open-vocabulary reasoning in urban scenes. Experiments on TUM2TWIN and ZAHA datasets show consistent gains over LangSplat, with relative improvements of 22.9% in 2D and 15.1% in 3D evaluation while preserving real-time rendering. CityLangSplat provides a practical framework for combining semantic city models with language-embedded 3D Gaussian Splatting for geometry-aligned urban scene interpretation. Code will be released at https://github.com/zqlin0521/CityLangSplat. 1:45pm - 2:00pm
RoofVIP benchmark dataset: 2D roof planar polygons and very high-resolution digital orthophoto pairs German Aerospace Center (DLR), Germany Accurate building roof modeling is fundamental to urban analytics, digital twins, and 3D city reconstruction; however, progress in deep learning–based reconstruction is constrained by the limited availability of diverse, high-resolution datasets with detailed geometric annotations. This study introduces the RoofVIP dataset, a large-scale benchmark featuring very high-resolution RGB orthophotos paired with 2D roof vectors that capture diverse urban morphologies across Munich, Germany. Following Level of Detail (LoD) 2.0 principles, RoofVIP encompasses a wide range of roof geometries and architectural complexities, enabling evaluation of both segmentation- and vectorization-based reconstruction methods. Two paradigms are examined: a two-step segmentation-based approach (Cascade Mask R-CNN, Mask R-CNN, SOLOV2, YOLACT) and a one-step direct vector prediction approach (HEAT, PolyRoof). ImageNet-pretrained region-based models, particularly Mask R-CNN and Cascade Mask R-CNN, achieve the highest segmentation accuracy, effectively delineating complex roof boundaries while revealing limitations on small or irregular structures. Geometry-based models show complementary strengths, with HEAT emphasizing topological regularity and PolyRoof focusing on geometric precision. Although performance is lower than on simpler datasets such as HEAT and Roof Intuitive, RoofVIP exposes challenges related to geometric diversity and scale variation, serving as a rigorous benchmark. The dataset includes predefined training, validation, and test splits, enabling consistent benchmarking across methods. By providing a challenging and diverse geometric landscape, RoofVIP aims to advance geometry-aware deep learning approaches and support scalable, high-fidelity 3D urban modeling. The dataset is publicly available through the project page at https://chaikalamrullah.github.io/RoofVIP/. 2:00pm - 2:15pm
Evaluating 3D Scene Representations for Aerial Photogrammetry across Diverse Cityscapes 1School of Geodesy and Geomatics, Wuhan University, Wuhan, China; 2Technology and Engineering Center for Space Utilization, University of Chinese Academy of Sciences, Beijing, China; 3Hubei Luojia Laboratory, Wuhan, China The proliferation of continuous Neural Radiance Field (NeRF) and 3D Gaussian Splatting (3DGS) has shifted the paradigm of 3D aerial reconstruction from relying solely on geometric stereo matching to inverse rendering optimization. However, while these emerging rendering-based frameworks excel in synthesizing photo-realistic novel views, their capability to extract accurate surfaces in complex aerial scenarios remains ambiguous compared to traditional methods. To establish a clearer understanding, this study presents a comprehensive evaluation of five representative frameworks spanning traditional Structure from Motion (SfM), purely Signed Distance Field (SDF) representations, unstructured 3D Gaussians, hybrid voxel-Gaussians, and strictly explicit sparse voxels. By systematically standardizing identical computational environments, inputs, and unified mesh-extraction pipelines on both real-world airborne LiDAR datasets and synthetic cityscapes, we assess their performance regarding visual fidelity, geometric accuracy, and resource efficiency. The experimental results reveal that while traditional MVS produces the highest overall geometric precision by strictly enforcing multi-view rigid geometry, it is prone to failures in texture-less regions. Among rendering-based representations, a fundamental trade-off exists: highly flexible, unstructured 3DGS achieve highest visual scores but degrade the underlying geometric surfaces; conversely, explicitly structured techniques demonstrate distinct superiority in regularizing topological coherence and floating artifact suppression. Furthermore, we observe that integrating structured voxels avoids the severe memory bottlenecks associated with extracting geometries from chaotic unorganized primitives. These empirical findings emphasize that for large-scale aerial photogrammetry, integrating explicit spatial structuralization into differentiable rendering pipelines is imperative for achieving scalable operations and bridging the geometric accuracy gap with traditional methods. 2:15pm - 2:30pm
Development of a 3D City Model-Based System for Pre-Flight Evaluation and Optimization of Aerial Image Acquisition Plans Kokusai Kogyo Co., Ltd., Japan In dense urban environments, aerial image acquisition often suffers from occlusions and redundant data due to the lack of quantitative evaluation tools at the flight-planning stage. To address this issue, this study develops a flight-planning support system that enables pre-acquisition visibility analysis for both terrain and building surfaces using existing 3D city models. The system performs ray-casting simulations based on user-defined flight parameters to quantify and visualize occluded and visible regions before flight, allowing planners to evaluate data quality and optimize image acquisition efficiency. Experiments were conducted using real flight plans with two representative aerial cameras: the Leica CityMapper-2 for multi-directional texture mapping and the Vexcel UltraCam Eagle 4.1 for nadir-based topographic mapping. The results show that the system effectively visualizes occlusions on roofs and walls, predicts building lean in nadir imagery, and assesses the influence of overlap ratios on ground visibility. These analyses enable users to design more cost-effective and geometrically consistent flight plans by identifying redundant overlaps and ensuring sufficient coverage for DSM and true-orthophoto generation. The proposed framework provides a quantitative and objective approach to improving the transparency and reliability of aerial survey planning, and it offers a foundation for integrating visibility simulation with subsequent photogrammetric workflows such as surface reconstruction and texture mapping. 2:30pm - 2:45pm
Image LiDAR based change detection and updating for urban 3D reconstruction Univ Gustave Eiffel, Géodata Paris, IGN, LASTIG, F-77454 Marne-la-Vallée, France There is a high demand for accurate and up-to-date territorial digital twins for human activities, but their production and updating costs remain prohibitive for many applications. Their generation relies on acquiring LiDAR and/or image data over the territory of interest. Each modality has its advantages: LiDAR is more accurate but more costly, while images are noisier but less costly and more easily accessible. Combining these two technologies to produce and update digital twins is thus a promising avenue.In this paper, we propose a pipeline based on 3D change detection to update a LiDAR point cloud using newer aerial imagery. First, triangle meshes are generated from LiDAR data and image-based dense matching. Then, 3D ray tracing is used to detect changes. After removing the changed parts, the point clouds are fused to update the scene.The proposed method is demonstrated on two datasets in France.The code will be open source on Github: https://github.com/whuwuteng/ChangeUpdateJN. 2:45pm - 3:00pm
SF-Recon: Simplification-Free Lightweight Building Reconstruction via 3D Gaussian Splatting School of Geodesy and Geomatics, Wuhan University, China PR. Lightweight building surface models are crucial for digital city, navigation, and fast geospatial analytics, yet conventional multi-view geometry pipelines remain cumbersome and quality-sensitive due to their reliance on dense reconstruction, meshing, and subsequent simplification. This work presents SF-Recon, a method that directly reconstructs lightweight building surfaces from multi-view images without post-hoc mesh simplification. We first train an initial 3D Gaussian Splatting (3DGS) field to obtain a view-consistent representation. Building structure is then distilled by a normal-gradient–guided Gaussian optimization that selects primitives aligned with roof and wall boundaries, followed by multi-view edge-consistency pruning to enhance structural sharpness and suppress non-structural artifacts without external supervision. Finally, a multi-view depth–constrained Delaunay triangulation converts the structured Gaussian field into a lightweight, structurally faithful building mesh. Based on a proposed SF dataset, the experimental results demonstrate that our SF-Recon can directly reconstruct lightweight building models from multi-view imagery, achieving substantially fewer faces and vertices while maintaining computational efficiency. | ||

