Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Daily Overview |
| Session | ||
WG II/4C: AI/ML for Geospatial Data
Session Topics: AI/ML for Geospatial Data (WG II/4)
| ||
| External Resource: http://www.commission2.isprs.org/wg4 | ||
| Presentations | ||
3:30pm - 3:45pm
DeepChoice: Learning View Weighting for Image-Guided 3D Semantic Segmentation 1University of Applied Sciences Western Switzerland (HES-SO / HEIG-VD); 2ESO lab, EPFL, Switzerland Multi-view image-to-point label transfer is an effective strategy for 3D semantic segmentation, but its performance largely depends on how predictions from multiple image observations are fused for each 3D point. Most existing pipelines rely on hard voting or handcrafted weighting rules, which do not explicitly learn the reliability of each view under varying geometric and image-quality conditions. In this paper, we introduce DeepChoice, a lightweight view-weighting module for image-guided 3D semantic segmentation. For each visible observation of a 3D point, DeepChoice exploits a compact set of visibility cues, including incidence angle, range, contrast, sharpness, signal-to-noise ratio, and saturation, to predict normalized per-view weights used to aggregate 2D semantic class probabilities into final 3D point-wise predictions. The method is sensor-agnostic, requires no meshing, and can be integrated as a replacement for standard multi-view fusion rules. Experiments on the full GridNet-HD benchmark show that DeepChoice improves over hard voting by 3.85 mIoU points and over mean-probability fusion by 1.26 points, while reducing the gap with the AnyView oracle upper bound. The largest gains are observed on thin and difficult classes such as conductors, pylons, and insulators. Furthermore, a complementary evaluation on the Images PointClouds Cultural Heritage}dataset shows that the proposed weighting strategy remains beneficial under a very different acquisition context and scene structure, yielding a 1.55 mIoU point improvement over hard voting. These results show that learning how to weight views is a simple yet effective way to strengthen image-guided 3D semantic segmentation pipelines. Code is publicly available at: https://huggingface.co/heig-vd-geo/DeepChoice. 3:45pm - 4:00pm
Semantic Segmentation of Textured Non-manifold 3D Meshes using Transformers Leibniz University Hannover, Germany Textured 3D meshes jointly encode geometry, topology, and appearance, yet their irregular structure poses significant challenges for deep-learning-based semantic segmentation. While a few recent methods operate directly on meshes without imposing geometric constraints, they typically overlook the rich textural information also provided by such meshes. We introduce a texture-aware transformer that learns directly from raw pixels associated with each mesh face, coupled with a new hierarchical learning scheme for multi-scale feature aggregation. A texture branch summarizes all face-level pixels into a learnable token, which is fused with geometrical descriptors and processed by a stack of Two-Stage Transformer Blocks (TSTB), which allow for both a local and a global information flow. We evaluate our model on the Semantic Urban Meshes benchmark and a newly curated cultural-heritage dataset comprising textured roof tiles with triangle-level annotations with damage types. Our method achieves 81.9\% mF1 and 94.3\% OA on SUM, and 49.7\% mF1 and 72.8\% OA on new dataset, substantially outperforming existing approaches. 4:00pm - 4:15pm
Pothole Classification using Point Cloud Data: a Comparison between Machine Learning and Deep Learning Norwegian University of Science and Technology, Norway Automatic pothole detection is important for improving road maintenance and transportation safety. While image-based pothole detection often struggles under poor lighting and weather conditions, point cloud data provides a robust alternative by capturing detailed surface geometry. Machine learning has demonstrated strong performance in point cloud classification. While traditional machine learning is simpler and relies on handcrafted features, deep learning models are more powerful, as they learn complex, high-dimensional patterns directly from the input data. While most existing work relies on deep learning models, which are time-consuming to train and require extensive labelled datasets, potholes can be well described by geometric features, making pothole detection well-suited for feature engineering. This paper compares traditional machine learning and deep learning approaches for pothole classification using point cloud data, to evaluate whether the added complexity and data demands of deep learning models are justified, or if traditional machine learning techniques are sufficient for accurate classification. A dataset with labelled pothole instances is created to train both models. The machine learning approach uses manually engineered geometric features as input to an ensemble classifier, while the deep learning model is trained on sampled data. Experimental results show that the machine learning approach outperformed the deep learning model. These results suggest that for this particular task, where informative domain-specific features can be manually engineered, the machine learning approach offers a more practical and efficient solution for real-world deployment, where labelled data may be limited. 4:15pm - 4:30pm
From Canopy to Crown: High-Fidelity Tree Facade Synthesis from Nadir LiDAR data 1University of Fraser Valley; 2University of Toronto; 3York University Synthesizing realistic fac¸ade views of individual trees from nadir-view remote sensing data would transform large-scale forest analysis, yet remains unsolved due to data scarcity and task ambiguity. We present the first conditional diffusion model to generate structurally plausible fac¸ade views of individual tree crowns from single nadir-view LiDAR rasters, leveraging the FOR-species20K benchmark dataset. Our approach integrates nadir projections with tree species and height within a U-Net-based denoising diffusion framework. Experiments demonstrate that nadir imagery alone is insufficient, but conditioning on species and height enables synthesis of visually realistic, species-specific fac¸ade views. The fully conditioned model achieves substantial gains in perceptual (LPIPS: 0.184) and structural (SSIM: 0.576) similarity, outperforming nadir-only baselines by more than twofold. Our results establish that ancillary attributes critically constrain the solution space, enabling diffusion models to infer plausible structures from ambiguous nadir input. This work demonstrates a scalable path to enriching nadir-based forest inventories with synthesized structural detail, reducing the need for resource-intensive ground surveys. 4:30pm - 4:45pm
Evaluation of Metric Monocular Depth Estimation Models Under Adverse Weather Conditions in Driving Scenarios University of Calgary, Canada Metric monocular depth estimation has become increasingly important and is often used as a redundancy mechanism in autonom ous driving, where accurate scene understanding is essential for safe decision-making. In this work, we evaluate three recently proposed models that represent the state-of-the-art (Depth Anything, PackNet-SfM, and UnidDepth) using zero-shot testing on the DrivingStereo dataset across diverse weather conditions, and benchmark their performance. Our analysis considers not only metric depth accuracy metrcis but also each model’s ability to generalize under challenging environmental variations. While UniDepth achieves notable improvements over Depth Anything and PackNet-SfM, our results show that substantial progress is still needed for robust real-world deployment. To further assess its practical suitability for autonomous driving applications, we conduct a detailed examination of UniDepth’s strengths, limitations, and failure modes. 4:45pm - 5:00pm
Out-of-Distribution Detection for Real-World Honey Bee Monitoring Using Simulated Permanent Laser Scanning 13DGeo Research Group, Institute of Geography, Heidelberg University; 2Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University We present the first Open-Set Recognition (OSR) workflow for environmental monitoring for Permanent Laser Scanning (PLS) setups, using a Deep Neural Network (DNN) solely trained on simulated data. Such monitoring systems were previously only trained with real-world data and under the closed-set assumption, because they are commonly designed to observe a specific and predefined phenomenon (e.g., beach erosion, rockfall activity, vegetation change, animal behavior). The use of real-world data requires manual labeling, which is tedious given the great amount of point clouds. For this reason, we use Virtual Laser Scanning of Dynamic Scenes (VLS-4D) in a PLS setup to investigate how knowledge from synthetic data can be applied to real-world PLS monitoring systems in open-set settings. We introduce a novel framework that enables Open-Set Recognition (OSR) for animal monitoring (e.g. honey bees) using PLS data. The DNN is fine-tuned exclusively on a simulated LiDAR point cloud time series of flying honey bees, and integrates OSR to handle unknown classes during real-world deployment (e.g., butterflies, leaves, wren, and hare). By leveraging deviations in feature embeddings of the DNN, our method reliably distinguishes the known honey bee class from previously unseen classes, supporting robust monitoring under persistent distribution shifts. This approach reduces the dependence on extensive manual annotation of real-world point clouds, while maintaining reliable classification performance. It also highlights the potential of synthetic training data and OSR for environmental monitoring with PLS systems. | ||

