Conference Agenda
Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
|
Agenda Overview |
| Session | ||
WG III/1A: Remote Sensing Data Processing and Understanding
Session Topics: Remote Sensing Data Processing and Understanding (WG III/1)
| ||
| External Resource: http://www.commission3.isprs.org/wg1 | ||
| Presentations | ||
8:30am - 8:45am
Cube Kernel: A Novel Approach to Enable Local Gradient Flow Across Channels in CNNs University of Glasgow, United Kingdom Understanding inter-band and cross-channel relationships is essential for human color perception and object recognition. Yet, local gradients in standard convolutions are tied to fixed input–output channel pairs, and thus channels are fused by a dense, fully-coupled weight tensor: each output channel aggregates all input channels in a uniform way at every spatial location. This leads to heavy computation and does not exploit structured sparsity or selective local channel mixing. To overcome this limitation, we introduce Cube Kernel, a novel convolutional operator that introduces structured cross-channel groups into the local gradient. This design strengthens cross-channel feature fusion, improves optimization efficiency, and reduces computational overhead. Extensive building extraction experiments validate its effectiveness: Cube Kernel consistently outperforms standard convolutions and Involution when integrated into UNet, and replacing a single layer in DeepLabV3+, Swin-UNet, or UNet leads to consistent performance gains. Beyond serving as a lightweight plug-in module, Cube Kernel also scales effectively as a fundamental building block. A Cube-enhanced ConvNeXt variant, ConvNeXt-Cube, achieves state-of-the-art performance across all models (0.9095 IoU / 0.9535 F1 on WBD and 0.9133 IoU / 0.9547 F1 on WHU), demonstrating strong stackability and architectural potential. These results highlight a largely overlooked space in CNN design: enhancing cross-channel interaction at the gradient level. Cube Kernel offers a scalable and efficient alternative to deepen networks for channel mixing, laying a foundation for future advancements in convolutional architecture design. 8:45am - 9:00am
Land Surface Dynamics Modeling and Prediction with dual Latent-Space Representations 1Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; 2School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing, China; 3Yazhou Bay Innovation Institute, Hainan Tropical Ocean University, Sanya, China; 4The University of Hong Kong, Hong Kong, China Modeling land surface dynamics from satellite observations is crucial for revealing change patterns and predicting future states, although effective modeling methods remain limited. For complex systems such as reaction-diffusion, two approaches have proven particularly effective: (i) Direct modeling in the high-dimensional observation space with deep networks(e.g., (Wang et al., 2022)). These methods are often autoregressive. Errors accumulate during rolling extrapolation. (ii) Modeling in a reduced-dimensional latent space(e.g., (Chen et al., 2022)). One reduces dimension and then learns the evolution. Some works estimate the intrinsic dimension (ID) and model in the ID latent space. This improves long-term stability, but reliance on latent representations may reduce accuracy. This route is promising if two issues are addressed: (1) effectively modeling multi-scale spatiotemporal data with long sampling intervals; (2) combining ID-space modeling with other latent dimensions to balance accuracy and stability. This paper proposes a Dual Latent-Space Representation-based Land Surface Dynamic Model (DLS-LSDM). The core contributions are: (1) a stacked-convolution and multi-scale linear-attention autoencoder to obtain a base latent, together with ID estimation to derive an ID latent; (2) a long-horizon scheme that combines ID and base latents to achieve both stability and high accuracy ; (3) comprehensive evaluation on ten-year MODIS NDVI across multiple climate zones, demonstrating superiority. 9:00am - 9:15am
Revealing Feature Contribution Mechanisms for Interpretable CNN-Transformer Remote Sensing Classification 1Wuhan university; 2China University of Geosciences; 3Nanjing University of Information Science and Technology Deep learning models have become the backbone of remote sensing image intelligent classification, enabling high-precision recognition of land cover, geospatial objects, and scene categories. However, their inherent "black-box" nature—where decision logic is embedded in complex parameter spaces—poses critical barriers to deployment in high-stakes domains such as military reconnaissance, disaster monitoring, and environmental governance. These fields demand transparent reasoning to validate model reliability, yet traditional interpretability methods suffer from two key limitations when applied to remote sensing data: They are primarily designed for natural images, failing to account for remote sensing-specific characteristics. They focus on local feature attribution or saliency mapping but lack quantitative analysis of how core image features (shape, texture, spectrum) contribute to global classification decisions, especially across different network architectures.To address these problem, this study proposes a comprehensive feature contribution analysis framework tailored to remote sensing images, with the core objectives of: (1) Decoupling and extracting shape, texture, and spectrum features from remote sensing images in a physically meaningful manner; (2) Quantifying the contribution of each feature type to classification decisions; (3) Revealing differences in feature processing mechanisms between CNN and Transformer architectures. 9:15am - 9:30am
EfficientViM-CD: An Efficient Remote Sensing Change Detection Network Based on Hidden State-Mixer 1State Key Laboratory of Information Engineering in Surveying , Mapping and Remote Sensing, wuhan university, China, People's Republic of; 2School of Information Science and Engineering, Wuchang Shouyi University High-resolution optical remote sensing change detection (CD) is of great significance in urban evolution monitoring, disaster assessment, and land management. Traditional deep models often face computational, memory, and inference latency bottlenecks when processing large high-resolution imagery. To address this, we propose EfficientViM-CD: a Hidden-State Mixer based efficient remote sensing change detection network. The approach builds upon the EfficientViM backbone, migrating global interaction operations into a compact hidden state space and leveraging Hidden State Mixer based on state space duality (HSM-SSD) to fuse global context while reducing computational complexity. We employ a Siamese encoding architecture to extract multi-scale features and hidden states from paired temporal images, and utilize a Cross-Hidden Fusion module to integrate hidden semantic interactions between time points. At each scale, local difference features are computed and enhanced in hidden state space, and a multi-scale decoder reconstructs a pixel-level change probability map. We conducted experiments on four public datasets (LEVIR-CD+, WHU-CD, S2Looking, SVCD) and compared against nine state-of-the-art methods. Results demonstrate that EfficientViM-CD achieves competitive accuracy while delivering significant advantages in inference speed and memory efficiency. This method offers a lightweight, efficient, and scalable solution for high-resolution remote sensing change detection, with potential for real-time monitoring and emergency response systems. 9:30am - 9:45am
Local NMS: Enhancing Object Detection in Large-Scale Remote Sensing Images via iterative pipelined Postprocessing Fraunhofer IOSB, Germany Object detection in large, dense remote sensing imagery is difficult because targets are often small and arbitrarily oriented, and state-of-the-art detectors cannot process very large images directly without a reduction in accuracy. Tiling-based inference workflows mitigate the latter issue by running inference iteratively on overlapping tiles, but introduce pre- and postprocessing overhead for image tiling and Non-Maximum Suppression (NMS). We introduce local NMS, an asynchronous tile-wise postprocessing scheme. Local NMS runs in a separate subprocess in parallel to tile-wise inference and collects intermediate results enqueued by the inference process, immediately applying postprocessing. Intelligent reordering of tiles in a preprocessing step ensures optimal usage of computing resources. We assess our method using three state-of-the art object detection models for horizontal and oriented bounding box detection on two benchmark datasets containing large dense aerial and satellite images, DOTA-v2.0 and Izembek Lagoon Birds, stratifying by image size and average object density. Local NMS consistently reduces end-to-end runtime across models and datasets without significant impact on mAP. A maximum runtime reduction of 60.77% on large dense DOTA-v2.0 scenes could be achieved without modifying model architectures or retraining. 9:45am - 10:00am
ERD: Extended RAW-Diffusion Framework for De-rendering sRGB Images 1Department of Computer Science, University of Toronto, Canada; 2Faculty of Geographical Science, Beijing Normal University, China Recovering RAW sensor measurements from rendered sRGB images is important for radiometric calibration, low-level vision, and computational photography. However, reversing a camera’s proprietary Image Signal Pipeline (ISP) is highly challenging, especially when the ISP is unknown. Existing inverse-ISP and diffusion-based approaches have several issues: they depend on known ISPs from the sensor, require one model per sensor, or generalize poorly across camera brands. This work presents ERD (Extended RAW-Diffusion), a unified diffusion-model framework for de-rendering sRGB images into RAW format for any given image, and does not require ISP to be known or camera information from the image. ERD extends the RAW-Diffusion architecture by incorporating camera metadata only during training, allowing the model to learn a shared representation across heterogeneous sensors. To capture global sensor characteristics, ERD introduces a conditioning mechanism, Feature-wise Linear Modulation (FiLM) for global features such as CFA patterns and color gains. To enhance structural consistency, ERD integrates a ControlNet branch that injects edge and gradient priors derived from the sRGB input, stabilizing RAW reconstruction under diverse tone-mapping operations. For practical adaptation to new sensors, ERD supports efficient few-shot tuning via LoRA. Evaluations on Adobe FiveK (Nikon and Canon) and RAW-NOD (Nikon and Sony) show that ERD outperforms state-of-the-art baselines in PSNR and SSIM, offering improved robustness to unseen camera models. ERD enables a practical, general-purpose inverse ISP process across heterogeneous imaging devices. | ||

