JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at isprs2026@icsevents.com.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Agenda Overview

Session

WG II/2A: Point Cloud Generation and Processing

Time:

Monday, 06-July-2026:

1:30pm - 3:00pm

Location: 713A

125 theatre

Session Topics:

Point Cloud Generation and Processing (WG II/2)

External Resource: http://www.commission2.isprs.org/wg2

Presentations

1:30pm - 1:45pm

LGSSM: Local-to-global state space model for serialized point cloud semantic segmentation

Hao Wu, Li Yan, Huchen Li, Qimeng Li, Longze Zhu, Junjie Yuan, Hong Xie

School of Geodesy and Geomatics, Hubei Luojia Laboratory, Wuhan University

Point clouds have become essential data for describing real-world objects. Accurate and efficient 3D semantic segmentation plays a crucial role in environment understanding and scene reconstruction. However, current segmentation methods still face challenges from unordered data, high computational complexity, limited scene perception, and insufficient generalization. To address these issues, we propose a local-to-global semantic segmentation method based on a state-space model (LGSSM). Specifically, the proposed method uses three-dimensional serialization encoding to serialize point clouds along the x, y, and z directions, effectively addressing the inherent disorder of point clouds and enhancing spatial representation. Then, the local state space model extracts fine-grained local geometric structural information and the global state space model captures the overall scene representation, improving the modeling ability for both short and long distances. Finally, the serialized context aggregation module is utilized to fuse contextual features to promote spatial semantic consistency. Extensive experiments conducted on ScanNet, ScanNet200, and S3DIS demonstrate that our model achieves state-of-the-art segmentation accuracy compared with other existing methods.

1:45pm - 2:00pm

Hierarchical Gaussian Partitioning for Semantic Segmentation of Airborne LiDAR Scenes

Moussa Bendjilali^1,2, Nicola Luminari¹, Pierre Alliez²

¹Alteia, France; ²Inria Sophia-Antipolis, France

In this paper, we present a novel approach to semantic segmentation of airborne LiDAR point clouds that integrates a hierarchical Gaussian Mixture Model (hGMM) within the Superpoint Transformer (SPT) framework. The hGMM constructs a coarse-to-fine representation of the scene by recursively fitting Gaussian components to spatially coherent subsets of the point cloud, resulting in a hierarchical and structured decomposition that serves as a structured token set for the segmentation objective. While Gaussian Mixture Models (GMMs) can virtually fit any distribution, we constrain their use to structured suburban scenes, where their parametric form is naturally suited to represent planar and ellipsoidal geometries, hence allowing parsimonious mixtures. Experimental results on the DALES benchmark demonstrate that our method achieves competitive performance with respect to state-of-the-art approaches, with notable improvements on classes such as ground and buildings. Results on indoor S3DIS confirm the method's intended specificity to outdoor environments. These findings validate hGMM as a principled and effective alternative to heuristic partitioning techniques, integrating stochastic modelling with transformer-based semantic reasoning in large-scale 3D environments.

2:00pm - 2:15pm

MCPF-Net: Multi-stage LiDAR-Image Collaborative Perception Fusion Network for Point Cloud Semantic Segmentation in Urban Scenes

Huchen Li¹, Wubiao Huang¹, Haibing Liu¹, Shihan Chen¹, Bin Liu², Fei Deng^1,3

¹School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China; ²Hinton STAI Institute, East China Normal University, Minhang, Shanghai 200241, China; ³Hubei Luojia Laboratory, Wuhan 430079, China

Point cloud semantic segmentation through multi-modal fusion provides a fundamental basis for surface observation and visual perception tasks. LiDAR provides precise geometric structural information, while optical images offer rich semantic and textural details. However, existing fusion methods still suffer from limited cross-modal perception and insufficient information complementarity. To address these challenges, we propose a multi-stage LiDAR-image collaborative perception fusion network (MCPFNet) for point cloud semantic segmentation in urban scenes. In the mid-stage, the network introduces a geometric-aware fusion (GAFM) and a semantic-aware fusion module (SAFM) to achieve bi-directional injection of structural and semantic features between LiDAR and image modalities. In the later stage, an adaptive feature fusion module (AFFM) is designed to refine semantic representations through gated weighting and bi-directional attention mechanisms. Extensive experiments demonstrated that MCPFNet achieved the best mIoU scores of 74.51%, 72.10%, and 95.15% on the ISPRS Vaihingen, FRACTAL, and N3C datasets, respectively, validating its superior performance in multi-modal semantic segmentation.

2:15pm - 2:30pm

Cross-Sensor Robustness and Spatial Generalization for 3D Railway Point Cloud Semantic Segmentation

Arshia Ghasemlou, Mario Soilán, Jesús Balado, Belén Riveiro

CINTECX, GeoTECH group, Universidade de Vigo

This contribution investigates the cross-sensor and spatial generalization of deep learning methods for 3D semantic segmentation in railway environments. Although current models achieve high accuracy on large benchmark datasets, their robustness under real-world acquisition variability remains insufficiently understood. To address this gap, three state-of-the-art architectures—Point Transformer v3, Swin3D, and MinkUNet—were trained on the SemanticRail3D dataset and evaluated on a newly acquired 120-m railway section captured with three heterogeneous LiDAR systems: a Faro Focus S150+ terrestrial laser scanner, a CHCNAV RS10 handheld device, and a GeoSLAM ZEB Go SLAM-based scanner.

The case-study point clouds were carefully registered, normalized, voxelized, and manually annotated to provide consistent ground truth across sensors. A standardized preprocessing and test-time augmentation pipeline was applied to ensure compatibility with the training domain. Generalization performance was analysed through per-class IoU, cross-model agreement, and sensor-dependent degradation patterns. Results show significant variability across acquisition platforms, with denser, low-noise scans enabling better transferability, while sparser SLAM-based point clouds remain challenging for thin or small components such as overhead wires.

To mitigate cross-sensor variability, an IoU-weighted ensemble strategy was introduced, leveraging complementary model strengths without requiring retraining. This ensemble consistently improved or matched the performance of individual models on the case-study datasets.

Overall, the study demonstrates the importance of evaluating semantic segmentation models under realistic multi-sensor conditions and provides a practical benchmark and methodology for assessing domain-shift effects in railway point clouds.

2:30pm - 2:45pm

Revisiting NeRF for Street Scene Point Cloud Semantic Segmentation in the Era of 3DGS

Yuzhou Zhou

University of Oxford, United Kingdom

Accurate semantic segmentation of urban point clouds is fundamental for autonomous driving and city mapping. Recent advances in neural scene representations, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have significantly improved photorealistic reconstruction quality. However, 3DGS is primarily designed for small-scale, object-centric scenes with dense viewpoints, and its optimization becomes sub-optimal in large-scale street scenes with trajectory-constrained observations, leading to semantic errors and distorted geometry.

In this work, we revisit NeRF-based scene representation in the era of 3DGS to address these challenges. Our method integrates the explicit and efficient modeling strategy of 3DGS with the surface-constrained sampling nature of NeRF. Specifically, we employ Deformable Neural Mesh Primitives (DNMPs) to jointly encode geometry and semantics, enabling efficient ray–mesh intersection sampling and neural field interpolation. This formulation achieves 3D-annotation-free point cloud semantic segmentation by leveraging rendered image supervision.

Experiments on the KITTI-360 dataset demonstrate that our approach surpasses the Street Gaussians baseline in overall mIoU and across most semantic categories. The improvement mainly stems from reducing semantic errors caused by limited viewpoints during 3D Gaussian optimization, providing a robust and scalable solution for street scene semantic understanding.

2:45pm - 3:00pm

Extraction of Pole-like Road Objects from MMS Point Clouds Using Deep Learning and Geometric-Topological Feature Fusion

Shu Su, Masataka Shirai, Hiroyuki Yokota

AERO TOYOTA CORPORATION, Japan

This paper presents a fusion framework for the automatic extraction of pole-like road objects—such as traffic lights, road signs, streetlights, and utility poles—from Mobile Mapping System (MMS) point clouds. The proposed method integrates KPConv-based semantic segmentation with geometric–topological reasoning to achieve structural completion and false-positive suppression without retraining or additional annotated data. The framework was trained on 8 km of manually labeled MMS data from the Kinki region, Japan, and evaluated on large-scale unseen data from Hokkaido (≈ 26 km, 2.53 billion points) and the Paris–Lille-3D benchmark (France) acquired with a different LiDAR sensor. The proposed approach significantly outperformed the KPConv baseline. On the Hokkaido dataset, the F₁-score improved from 0.8263 to 0.8689 (+0.0426), successfully reconstructing lamp tops, signal arms, and previously unseen snow delineator posts (snow poles). On the Paris–Lille-3D benchmark, recall increased by 15.5 points, yielding an overall F₁-score gain of +0.0802. The 26 km Hokkaido dataset was processed in less than 13 hours on a single NVIDIA Quadro RTX 8000. These results demonstrate that the proposed deep learning–geometry–topology fusion achieves robust, scalable, and efficient performance across diverse geographic and sensor domains, supporting nationwide road-asset mapping and digital-twin generation.