JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at isprs2026@icsevents.com.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Daily Overview

Location: 715B
125 theatre

Date: Monday, 06-July-2026

8:30am - 10:00am

WG II/3A: 3D Scene Reconstruction for Modeling & Mapping
Location: 715B

8:30am - 8:45am

GT-LOD3: LOD3 Semantic 3D Building Reconstruction Benchmark Dataset

Han Sae Kim¹, Olaf Wysocki², Ludwig Hoegner³, Ksenia Bittner⁴, Joshua Carpenter⁵, Friedrich Fraundorfer^4,6, Arpan Kusari⁷, Max Mehltretter⁸, Franz Rottensteiner⁸, Anna Schadl⁹, Martin Weinmann¹⁰, Jinha Jung¹

¹Lyles School of Civil and Construction Engineering, Purdue University, West Lafayette, IN, USA; ²CV4DT, Department of Engineering, University of Cambridge, Cambridge, United Kingdom; ³Faculty of Civil Engineering, Hochschule München University of Applied Sciences, Munich, Germany; ⁴Remote Sensing Technology Institute, German Aerospace Center (DLR), Oberpfaffenhofen, Germany; ⁵Department of Civil Engineering, The University of Akron, Akron, OH, USA; ⁶Institute of Visual Computing, Graz University of Technology, Graz, Austria; ⁷University of Michigan Transportation Research Institute, University of Michigan, Ann Arbor, MI, USA; ⁸Institute of Photogrammetry and GeoInformation, Leibniz University Hannover, Hannover, Germany; ⁹Faculty of Geoinformatics, Hochschule München University of Applied Sciences, Munich, Germany; ¹⁰Institute of Photogrammetry and Remote Sensing, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

This contribution introduces GT-LOD3, a new benchmark dataset designed to advance semantic Level of Detail 3 (LOD3) building reconstruction from UAS-based photogrammetric point clouds. Existing benchmarks primarily focus on mesh- or point-level semantic labelling, façade segmentation, or LOD2-level modelling, but high-quality, geometry-accurate LOD3 ground truth paired with real-world photogrammetric observations are still limited. GT-LOD3 fills this gap by offering paired UAS point clouds and manually modeled LOD3 reference data in CityGML format, enabling research on window-level facade reasoning, geometric regularization, and instance-level shape recovery.

The benchmark currently consists of two subsets featuring different architectural styles and environmental conditions: (1) a urban block in Gold Coast (Lakewood, Ohio, USA), and (2) the Technical University of Munich (TUM) campus. The accompanying LOD3 reference models contain explicit window geometry, enabling detailed evaluation of both detection performance and polygon-level geometric accuracy.

We further provide a baseline reconstruction pipeline that combines point-cloud semantic segmentation, facade-aligned 2D projection, window region extraction, and geometric back-projection into CityGML. An evaluation protocol is presented including pixel-level metrics (IoU, precision, recall, F1) and instance-level detection metrics based on optimal assignment via the Hungarian algorithm.

8:45am - 9:00am

LoD2-Former: Multi-Modal Transformer-Based 3D Building Wireframe Reconstruction

Youssef Abdelhedi¹, Daniel Panangian¹, Chaikal Amrullah¹, Houda Chaabouni-Chouayakh², Ksenia Bittner¹

¹Remote Sensing Technology Institute, German Aerospace Center (DLR), Wessling, Germany; ²Sm@rts Laboratory, Digital Research Center of Sfax, Tunisia

Building wireframe reconstruction from LiDAR faces challenges due to sparse and incomplete point cloud data. We present LoD2-Former, a multi-modal Transformer architecture that fuses aerial LiDAR and optical imagery for end-to-end 3D roof wireframe reconstruction. Unlike existing point-cloud-only methods, our dual-backbone approach with bidirectional cross-modal attention leverages complementary geometric and visual information. Experiments on two datasets show consistent improvements in edge detection metrics, with edge F1-scores increasing from 0.874 to 0.899 on Tallinn and 0.968 to 0.974 on Roof-Intuitive, while substantially boosting corner recall (0.630 to 0.729) in complete-data settings. We also contribute a curated multi-modal subset of Building3D with aligned LiDAR and aerial imagery to facilitate future research.

9:00am - 9:15am

Point2WSS: Reconstructing LoD2 Buildings from Aerial LiDAR Data using Multimodal Learning and Weighted Straight Skeleton

Pierre-Loïc Queffélec^1,2, Nicolas Trouvé¹, Stéphane Roussel¹, Teng Wu², Bruno Vallet²

¹DEMR, ONERA, Université Paris Saclay, F-91123 Palaiseau, France; ²Univ Gustave Eiffel, ENSG, IGN, LASTIG, F-77420 Champs-sur-Marne, France

In this paper, a method exploiting aerial LiDAR point clouds to build realistic building meshes suitable for electromagnetic simulation is proposed. One of the main challenges lies in reconstructing regularized building meshes with low polygonal density. Optimization-based methods, commonly used for building reconstruction from point clouds, are highly data-driven, making the quality of results dependent on the quality of input data. Aerial LiDAR scans can be incomplete or sparse, for instance due to occlusion. A novel LoD2 buildings reconstruction method based on deep learning is proposed, assuming that deep learning methods are more robust to incomplete or sparse data than optimization-based methods. A parametric building model is introduced, based on the Weighted Straight Skeleton algorithm, which generates realistic roofs from a building footprint and an associated set of slopes, and subsequently extrudes the roof to the specified building height. This parametric approach guarantees that a given set of parameters (height, footprint and slopes) produces a regularized building mesh with low polygonal density. A multimodal model, named Point2WSS, was trained to recover the variable number of building's continuous parameters from its corresponding point cloud. This approach enables the generation of realistic building meshes suitable for electromagnetic simulation, if the predicted parameters accurately approximate real-world values.

9:15am - 9:30am

Wide-area Scene Reconstruction with polyhedral Buildings featuring recognized Regularities

Jochen Meidow

Fraunhofer IOSB, Germany

The modeling of buildings suffers from a dichotomy between generic and specific representations: the lack of domain knowledge in flexible models that can represent many shapes, and the restricted geometry of pre-specified parametric building primitives. To fill this gap, we propose using general boundary representations enriched with automatically recognized and enforced geometric constraints derived from human-made regularities. The proposed reasoning process relies on the statistics of the planar point groups extracted from airborne-captured point clouds. Hence, a chosen significance level is the only process parameter. To enforce the creation of sound solids, we apply manifold constraints for the generation of the boundary representations. The feasibility and usability of the approach are demonstrated by evaluating an airborne-captured laser scan containing approximately 7,600 buildings over an area of 50 km^2 featuring both inner-city and rural landscapes.

9:30am - 9:45am

The P3 Dataset: Pixels, Points and Polygons for Multimodal Building Vectorization

Raphael Sulzer^1,2, Liuyun Duan², Nicolas Girard², Florent Lafarge¹

¹Université Côte d’Azur, INRIA – Sophia-Antipolis, France; ²LuxCarta Technology, Mouans-Sartoux, France

We present P3, a large-scale multimodal dataset for building vectorization, including aerial LiDAR point clouds, aerial images, and vectorized 2D building outlines, collected across three continents. P3 contains over 10 billion LiDAR points with decimeter-level accuracy and RGB images at a ground sampling distance of 25 centimeters. While many existing datasets focus on the image modality, P3 offers a complementary perspective by incorporating dense 3D information. We demonstrate that LiDAR point clouds serve as a robust modality for predicting building polygons, both in hybrid and end-to-end learning frameworks. Moreover, fusing LiDAR and imagery further improves accuracy and geometric quality of predicted polygons. The P3 dataset is publicly available, along with code and pretrained weights of three state-of-the-art models for building polygon prediction at https://github.com/raphaelsulzer/PixelsPointsPolygons .

9:45am - 10:00am

Building height estimation from stereo satellite images using contour vector registration

Yaxuan Duan, Wei Qin, Xin Huang, Pei Mi, Yang Yu, Pengjie Tao

School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China

Accurate building height estimation plays a crucial role in large-scale 3D urban reconstruction. However, conventional stereo matching approaches often suffer from mismatches around building edges, leading to unreliable height retrieval in dense urban areas. To address this issue, this paper presents a novel method for building height estimation based on contour vector registration integrated with the vertical line locus technique. The proposed framework first automatically matches building contour vectors extracted from stereo high-resolution satellite images. Then, for each paired contour, a range of candidate heights is searched using a rational function model to project the reference contour from the image space to object space and then reproject it onto the conjugate image. The elevation that maximizes the overlap ratio between projected and paired contours is identified as the optimal roof elevation. Building height is subsequently derived by subtracting the ground elevation from the estimated roof elevation. Experiments conducted on SuperView-1 (SV-1) satellite stereo images over Jiuyuan District, Baotou, Inner Mongolia, China, demonstrate the effectiveness of the proposed method. The resulting building height estimates achieve a root mean square error of 0.84 m compared to manual measurements, showing strong agreement (r = 0.9993). The proposed contour-based stereo registration approach provides a robust and efficient solution for building height extraction from high-resolution satellite data, supporting precise urban 3D modeling and large-scale spatial analysis.

1:30pm - 3:00pm

WG IV/1B: Spatial Data Representation and Interoperability
Location: 715B

1:30pm - 1:45pm

Zonology: An Ontology-Based Framework for Harmonizing Zoning Semantics Across Multi-Jurisdictional Greater Toronto Area (GTA) Planning Systems

Arian Mesbahian¹, Amin Sarang², Arash Shahi³, Mojgan Jadidi¹

¹Department of Civil Engineering, Lassonde School of Engineering, York University, Canada; ²DevNext Inc., Canada; ³AECO Innovation Lab Inc., Canada

Urban development in the Greater Toronto Area faces significant challenges because zoning abbreviations and terminology vary widely between municipalities. This provides the background and motivation for the study, as labels such as “R2” in Toronto and “R2 S” in Markham appear similar yet represent different permissions and development standards, creating confusion and slowing planning workflows in a region with growing housing pressures. The problem addressed in this research is the absence of a unified, machine-readable framework that standardizes zoning terminology across municipalities, which limits automated compliance checking, GIS integration, and cross municipal comparison. The objective of this work is to create Zonology, an ontology-based framework that harmonizes zoning abbreviations, permitted land uses, and development standards, beginning with the City of Toronto and the City of Markham. The methodology follows the Linked Open Terms approach, using the Web Ontology Language to encode zoning by laws, land use categories, development standards, and spatial relationships. The model is evaluated through reasoning tasks, competency questions, and semantic alignment tests to ensure clarity, consistency, and interoperability. The results show that Zonology successfully aligns more than sixty zoning categories and over one hundred fifty land use permissions, enabling consistent semantic interpretation and cross municipal queries. The overall significance of this work is that the ontology improves regulatory clarity, strengthens data driven planning, and provides a scalable foundation for harmonized zoning governance across the Greater Toronto Area.

1:45pm - 2:00pm

GeoGraphJSON: A lightweight semantic data model integrating spatial geometry and graph connectivity for AI-driven spatial reasoning

Muhamad Alrajhi¹, Christian Heipke², Mohammed Afroz Khan¹

¹RASIKH Institute for Education and Training, Riyadh; ²Leibniz Universität Hannover

Urban systems are increasingly complex, interconnected, and dynamic, yet most geospatial data models continue to represent them as static geometric layers with limited support for explicit relationships and semantics. This restricts advanced spatial reasoning, network analysis, and AI-driven applications.

This paper introduces GeoGraphJSON, a lightweight semantic data model that extends GeoJSON by integrating spatial geometry with graph-based connectivity. The framework represents spatial entities as nodes and explicitly encodes relationships as typed edges, enabling unified representation of geometry, topology, and semantics within a single interoperable structure. A hierarchical Unique Identifier (UID) system ensures consistent lineage and cross-layer integration across administrative, transportation, and urban asset datasets.

The approach is validated using a large-scale urban dataset from Riyadh, comprising over 10,000 nodes and 13,000 edges. Graph-based analysis demonstrates realistic spatial patterns, including right-skewed degree distribution, strong network connectivity, and identifiable community structures. These results highlight the ability of GeoGraphJSON to capture hierarchical organization and functional relationships while supporting efficient analytical workflows.

By bridging geometry-centric GIS models and graph-based approaches, GeoGraphJSON provides a scalable foundation for urban analytics, digital twins, and GeoAI applications, enabling geospatial systems to evolve from static representations toward intelligent, relationship-aware spatial models.

2:00pm - 2:15pm

Urban Morphological Clustering of Cairo, and Makkah A Comparative Analysis Using Spatial Networks

Ahmad M. Senousi^1,2, Wael Ahmed^1,2, Adel Elshazly¹, Moustafa Baraka³, Walid Darwish^1,2

¹Geomatics Engineering Lab, Public Works Department, Cairo University, Giza 12613, Egypt;; ²NAMAA for Engineering Consultations, Dokki , Giza 12612, Egypt; ³Civil Engineering Program, German University in Cairo 11835, Egypt

Urban morphology quantitatively reveals how distinct historical and functional drivers shape city form. This study employs a computational morphometric approach using the Momepy library to analyze and compare the urban structures of Cairo, Egypt, and Makkah, Saudi Arabia. These cities represent paradigmatic cases: Cairo exemplifies long-term, organic layering, while Makkah demonstrates rapid, purpose-driven transformation for religious pilgrimage. We calculated key metrics—including tessellation area, convexity, elongation, equivalent rectangular index, and edge betweenness centrality—for building footprints and street networks sourced from OpenStreetMap. Results show Cairo possesses a heterogeneous, polycentric fabric with complex plot shapes and a distributed street network, reflecting its layered history. Conversely, Makkah exhibits a more monocentric, consolidated form with standardized building geometries and a hierarchical street network channeling movement toward its core. The findings demonstrate that quantitative morphology effectively captures how Cairo's organic evolution and Makkah's centralized planning produce fundamentally different, yet equally revealing, urban spatial structures, offering a replicable framework for cross-city analysis in the region

2:15pm - 2:30pm

An Assessment of Spatiotemporal Dynamics of Urban Illumination and Socioeconomic Patterns in Delhi Using VIIRS Nighttime Light Data

Manisha Kumari¹, Aditya Kumar Thakur²

¹Tilka Manjhi Bhagalpur University, India; ²Indian Institute of Technology Roorkee, India

Urban illumination, as captured through Nighttime Light (NTL) data, serves as a powerful indicator of human activity, infrastructure development, and socioeconomic progress in rapidly growing cities. However, previous studies on Delhi have largely focused on temporal NTL trends without integrating multi-year statistical and spatial analyses to reveal underlying urban and socioeconomic dynamics. This study investigates the spatiotemporal dynamics of urban illumination and development over Delhi using VIIRS Day/Night Band (DNB) NTL data for the years 2015, 2020, and 2025. NTL intensity was used as a proxy for urbanization and socioeconomic activity. Monthly composite datasets for January of each year were processed, clipped to the Delhi administrative boundary, and analyzed using statistical, temporal, and correlation-based methods. The results revealed a slight decline in mean NTL intensity from 26.34 in 2015 to 24.95 in 2025, indicating stabilization in overall light emissions may be due to the adoption of energy-efficient technologies. However, the maximum and range values increased markedly (166.85 to 228.04), signifying intensified illumination in high-activity commercial and infrastructural zones. Temporal change analysis showed balanced positive and negative illumination shifts, with over 50% of pixels exhibiting moderate growth during 2020–2025. Strong Pearson and Spearman correlations (r = 0.83–0.92; ρ = 0.910.95) confirmed the temporal consistency of illumination distribution. The socioeconomic assessment highlighted spatial disparities in light intensity might be corresponding to varying economic activity levels. Overall, the study demonstrates that VIIRS-derived NTL data provide an effective and robust approach for monitoring urban growth, socioeconomic variability, and sustainable lighting transitions in metropolitan environments.

2:30pm - 2:45pm

Artificial Intelligence for territorial interpretation: from image clustering to perceptual mapping

Fabio Bianconi, Marco Filippucci, Chiara Mommi

University of Perugia, Italy

The research investigates artificial intelligence as a device for the automatic interpretation of landscape, reframing representation not as a neutral reproduction but as a cognitive operation in which perception, description, and evaluation converge. Moving from the assumption that landscape is not an objective given but a culturally and perceptually constructed form, the study proposes a fully data-driven methodology based on geolocated images. Through a systematic grid sampling, street-level panoramic views are collected and processed within a multimodal pipeline integrating visual analysis, language models, and multi-agent evaluation. Images are first translated into textual descriptions and semantically clustered, allowing territorial classes to emerge from the data rather than from predefined taxonomies. In parallel, a simulated cognitive framework, structured through four agent profiles, produces evaluative scores and textual judgments, later analysed through sentiment detection. The integration of these layers generates a georeferenced dataset from which a perceptual cartography of the territory is constructed. Applied to the urban context of San Giustino (Italy), the method reveals a continuous gradient from dense urban cores to rural landscapes, while exposing differentiated perceptual readings across observer profiles. Within this framework, artificial intelligence does not replace human interpretation; it operates as an epistemic extension, transforming the landscape into a distributed field of comparable perceptions, where representation becomes a computable form of shared knowledge.

2:45pm - 3:00pm

Towards the Development of a Metadata-driven Usability Awareness Prototype for Interoperable GIS Operation Design

Jumg-Hong Hong, How-Han Chang, Ting-Yi Lee, Ting-Yu Chang

Dept. of Geomatics, National Cheng Kung University, Chinese Taipei

This study focuses on bridging usability information between data providers and data users through standardized metadata. By further integrating standardized metadata with geographic information system operation design, the operations gain automated and awareness capabilities, allowing usability information based on data specifications and quality considerations to be incorporated into relevant processes, thereby avoiding erroneous decisions. The research references international standards such as ISO 19115 and ISO 19157 to meet the requirements of open geographic information technologies.

3:30pm - 5:15pm

WG II/3B: 3D Scene Reconstruction for Modeling & Mapping
Location: 715B

3:30pm - 3:45pm

3D gaussian splatting for large-scale 3D reconstruction: an evaluation and quality analysis

Jiangxue Yu¹, Yueling Liao¹, San Jiang^2,3, Xing Zhang^2,3, Zhijun Wang⁴, Qingquan Li^2,3

¹School of Computer Science, China University of Geosciences, Wuhan 430074, China; ²Guangdong Key Laboratory of Urban Informatics, Shenzhen University, Guangdong Shenzhen, 518060, China; ³MNR Key Laboratory for Geo-Environmental Monitoring of Great Bay Area, Shenzhen University, Guangdong Shenzhen, 518060, China; ⁴Guangdong Laboratory of Artificial Intelligence and Digital Economy (Shenzhen), Guangdong Shenzhen, 518060, China

Large-scale 3D reconstruction has emerged as a key research in the fields of photogrammetry and computer vision. 3D Gaussian Splatting (3DGS) has become a mainstream approach due to its efficient rendering, but it confronts critical challenges in large-scale scenarios: excessive memory overhead and inadequate geometric accuracy. Meanwhile, the traditional Structure from Motion and Multi-view Stereo (SfM-MVS) framework, despite its cumbersome process, continues to exhibit robust performance. Notably, a systematic evaluation comparing these two paradigms in large-scale scenes remains absent. To address this, we develop a unified verification framework to evaluate the texture rendering quality and geometric reconstruction precision of several recent methods using real-world datasets. The results indicate that SfM-MVS methods still maintain an advantage in the completeness and accuracy of geometric reconstruction. In contrast, 3DGS methods have achieved breakthroughs in local accuracy or rendering-geometry synergy, yet their global consistency requires further improvement.

3:45pm - 4:00pm

RobustGauss: Robust 3D gaussian splatting for distractor-free 3D scene reconstruction

Haibing Liu¹, Shihan Chen¹, Huchen Li¹, Wubiao Huang¹, Shuai Zhang¹, Fei Deng^1,2

¹School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China; ²Hubei Luojia Laboratory, Wuhan 430079, China

3DGS-based methods often render transient distractors in 3D scenes as significant floating artifacts. Existing works for removing transient distractors suffer from under-identification or over-identification, resulting in residual transient distractors affecting reconstruction quality or loss of scene information, preventing the reconstruction of fine details. To address these challenges, we propose RobustGauss. We first rely solely on the cosine similarity of DINOv2 features to robustly predict uncertainty masks and accurately identify the main regions of transient disturbances and their corresponding shadows. Due to the limited resolution of DINOv2 features, we use high-resolution image residuals to refine the edges of the initial uncertainty masks, thereby accurately identifying all transient distractors and minimizing their impact on 3D scene reconstruction. Experiments on two challenging datasets demonstrate that our method achieves state-of-the-art performance.

4:00pm - 4:15pm

BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model

Yuci Han¹, Charles Toth¹, John Anderson², William Shuart², Alper Yilmaz¹

¹the ohio state university, United States of America; ²USACE ERDC GRL

N/A

4:15pm - 4:30pm

EMVSNet: Evidential multi-view stereo reconstruction for sampling-free depth and uncertainty estimation

Christian Grannemann, Max Mehltretter

Leibniz University Hannover, Germany

We present EMVSNet, a sampling-free Multi-View Stereo (MVS) method that, to the best of our knowledge, is the first to integrate Evidential Deep Learning into MVS. Given a set of overlapping images, our method predicts a depth value together with its associated uncertainty per pixel of a reference image, incorporating uncertainty from aleatoric and epistemic sources. Specifically, we use an existing convolutional neural network architecture designed for MVS as backbone and extend it to regress evidential parameters per pixel, describing the probability distribution over the depth corresponding to this pixel. In contrast to existing MVS methods that often neglect epistemic uncertainty or obtain it via sampling at inference, our evidential formulation does not require sampling, but enables single-pass inference. We evaluate the uncertainty estimation capabilities of our method using two publicly available datasets and compare the depth predictions against a deterministic variant. The experimental results demonstrate that EMVSNet achieves competitive depth accuracy while, at the same time, providing uncertainty estimates that enable us to reliably rank depth estimates according to their risk of being incorrect and to automatically identify out of distribution data. Our model shows only slightly increased inference time compared to a deterministic baseline while giving comparable uncertainty estimates to an computationally expensive sampling based approach, marking a first step towards real-time capable uncertainty estimation for image-based 3D reconstruction.

4:30pm - 4:45pm

Adaptive Scaling with Geometric and Visual Continuity of completed 3D objects

Jelle Vermandere, Maarten Bassier, Maarten Vergauwen

KU Leuven, Belgium

Object completion networks typically produce static Signed Distance Fields (SDFs) that faithfully reconstruct geometry but cannot be rescaled or deformed without introducing structural distortions. This limitation restricts their use in applications requiring flexible object manipulation, such as indoor redesign, simulation, and digital content creation. We introduce a part-aware scaling framework that transforms these static completed SDFs into editable, structurally coherent objects. Starting from SDFs and Texture Fields generated by state-of-the-art completion models, our method performs automatic part segmentation, defines user-controlled scaling zones, and applies smooth interpolation of SDFs, color, and part indices to enable proportional and artifact-free deformation. We further incorporate a repetition-based strategy to handle large-scale deformations while preserving repeating geometric patterns. Experiments on Matterport3D and ShapeNet objects show that our method overcomes the inherent rigidity of completed SDFs and is visually more appealing than global and naive selective scaling, particularly for complex shapes and repetitive structures.

4:45pm - 5:00pm

MambaPanoptic: a Vision Mamba-based Structured State Space Framework for panoptic Segmentation

Qing Cheng^1,2, Damiano Bertolini^1,3, Wei Zhang⁴, Dong Wang⁵, Niclas Zeller⁶, Daniel Cremers^1,2

¹Technical University of Munich, Germany; ²Munich Center for Machine Learning; ³Polytechnic University of Milan; ⁴University of Stuttgart; ⁵Wuhan University; ⁶Karlsruhe University of Applied Sciences

Panoptic segmentation requires the simultaneous recognition of countable thing instances and amorphous stuff regions, placing joint demands on long-range context modelling, multi-scale feature representation, and efficient dense prediction. Existing convolutional and transformer-based methods struggle to satisfy all three requirements concurrently: convolutional architectures are limited in their capacity to model long-range dependencies, while transformer-based methods incur quadratic computational cost that is prohibitive at high resolutions. In this paper, we propose MambaPanoptic, a fully Mamba-based panoptic segmentation framework that addresses these limitations through two principal contributions. First, we introduce MambaFPN, a top-down feature pyramid that leverages Mamba blocks to generate globally coherent, multi-scale feature representations with linear computational complexity. Second, we adopt a PanopticFCN-style kernel generator that produces unified thing and stuff kernels for proposal-free panoptic prediction, enhanced by a QuadMamba-based feature refinement module applied at multiple network stages. Experiments on the Cityscapes and COCO panoptic segmentation benchmarks demonstrate that MambaPanoptic consistently outperforms PanopticDeepLab and PanopticFCN under comparable model sizes, and matches or surpasses Mask2Former on Cityscapes in PQ and AP while requiring fewer parameters.

5:00pm - 5:15pm

GeoPrior-Diff: Using Stable Diffusion as a geometric Prior for single-view 3D Point Cloud Reconstruction

Youssef Korny¹, Sunghwan Yoo¹, Mohammad Moein Sheikholeslami¹, Daniel Panangian², Ksenia Bittner², Andreas Wichmann³, Gunho Sohn¹

¹Dept. of Earth and Space Science and Engineering, York University, Canada; ²Remote Sensing Technology Institute, German Aerospace Center (DLR), Germany; ³Institute for Applied Photogrammetry and Geoinformatics (IAPG), Jade University of Applied Sciences, Germany

Single-view 3D reconstruction from monocular aerial imagery presents a fundamental challenge in remote sensing due to the inherent scale ambiguity and the complex geometry of urban environments. Traditional regression-based methods often struggle to recover high-frequency structural details, leading to over-smoothed or noisy outputs. To address this, we introduce GeoPrior-Diff, a novel two-stage framework that leverages the generative capabilities of Latent Diffusion Models to reconstruct high-fidelity 3D point clouds.

Unlike direct generation approaches, our method explicitly bridges the domain gap between 2D texture and 3D structure by utilizing an intermediate geometric prior. In the first stage, we predict an oblique normal map from the input satellite imagery, capturing essential surface orientation and structural boundaries. In the second stage, this normal map serves as a strong conditioning signal for a probabilistic diffusion model, guiding the denoising process to synthesize accurate 3D point clouds. Preliminary results demonstrate that decoupling geometric estimation from point generation significantly enhances structural consistency and reduces artifacts compared to baseline methods. This work highlights the potential of using generative priors for robust 3D urban modeling from limited data.