Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Agenda Overview |
| Session | ||
ICWG I/IV: Robotics for Mapping and Machine Intelligence
Session Topics: Robotics for Mapping and Machine Intelligence (ICWG I/IV)
| ||
| External Resource: http://www.commission1.isprs.org/icwg-1-4 | ||
| Presentations | ||
12:00pm - 12:15pm
A category-specific prompt strategy for semantic 3D indoor mapping using RGB-D camera 1Remote Sensing and Image Analysis, Department of Civil and Environmental Engineering, Technical University of Darmstadt, Germany; 2Geodetic Measurement Systems and Sensor Technology, Department of Civil and Environmental Engineering, Technical University of Darmstadt, Germany Semantic 3D indoor mapping often depends on supervised learning and large annotated datasets, limiting scalability across diverse environments. This work introduces a category-specific prompt strategy for semantic 3D mapping using RGB-D cameras, integrating RGB-D SLAM with the Segment Anything Model 2 (SAM2) to enable annotation-efficient reconstruction. Keyframes and trajectories extracted from SLAM provide spatial references, while SAM2 performs zero-shot segmentation guided by a Category-Specific Prompt Strategy (CPSS), which segments structural and functional elements (e.g., floors, doors, staircases) by category to reduce prompt interference and manual effort. The segmented keyframes are then fused with depth and pose data to produce instance-level semantic point clouds. Experiments on custom RGB-D sequences and selected ScanNet scenes demonstrate centimeter-level geometric accuracy and strong semantic consistency, with mIoU values up to 0.89 on the custom dataset and 0.98 on ScanNet. The resulting semantic point clouds are clean, structured, and require minimal post-processing, showing that the proposed strategy provides an efficient and scalable solution for semantic 3D indoor mapping without retraining or environment-specific supervision. 12:15pm - 12:30pm
3L-Planner: Lightweight LiDAR mapping and real-time local planning for ground robot autonomous navigation State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, China Mobile robots are widely used in unmanned surveying, warehouse logistics, and emergency response. However, achieving safe, reliable, and efficient autonomous navigation in unknown environments remains challenging, where accurate environment representation and feasible trajectory planning are crucial. This paper presents an autonomous navigation method integrating lightweight LiDAR mapping with real-time local planning for ground robots. At the perception level, an incremental single-frame point cloud update is used to accumulate and project locally traversable space, producing a lightweight obstacle map that preserves geometric accuracy while reducing planning computation. At the planning level, A* is employed to generate reference control points, and uniform B-spline curves are used to optimize the trajectory while enforcing kinematic feasibility and smoothness. At the control level, nonlinear model predictive control (NMPC) ensures accurate trajectory tracking by producing control commands that satisfy velocity and acceleration constraints. The framework also supports low-cost evaluation in simulation. Experiments in simulated forests, simulated indoor corridors, and real-world gardens and hallways show average navigation speeds of 2.24 m/s, 0.76 m/s, 0.43 m/s, and 0.38 m/s, respectively. Results demonstrate that the proposed method generates smooth, feasible, and safe trajectories and completes autonomous navigation and mapping tasks across diverse environments. 12:30pm - 12:45pm
CMCL-PR: Cross-Modal Camera-to-LiDAR Place Recognition with Cross-Attention Contrastive Learning Wuhan University, China, People's Republic of Place recognition is a crucial task for both robots and autonomous vehicles, facilitating positioning and loop closure within pre-built global maps. Although single-modal sensor-based methods have shown satisfactory performance, cross-modal place recognition—retrieving low-cost camera images from global point cloud databases—remains a significant challenge. In this paper, we propose a contrastive learning-based lightweight cross-modal place recognition framework (CMCL-PR) to retrieve a single image from a global offline point cloud map. We introduce a perspective projection based field-of-view(FoV) transformation module that converts point clouds into a modality analogous to images; Then, we design a dual branch intra-modal encoder structure based on shared Transformer, which extracts and aligns image and point cloud features separately, effectively unifying the feature distribution between modals; Besides, a cross-attention mechanism module guided by inter-modal consistency was constructed, which utilizes the contribution of scene context information within different modalities to generate the discriminating cross-modal descriptors. Finally, during the contrastive learning, cross-modal feature was enhanced, and a multi loss function was constructed, including cross-modal contrastive learning loss, intra-modal consistency loss, and matching supervision loss. We assess the effectiveness and generalizability of our method using three publicly available datasets: KITTI, KITTI-360, and Oxford RobotCar. The project page and code will be released at https://github.com/qp-li/CMCL-PR. | ||

