JavaScript is Disabled
Your browser's JavaScript functionality is disabled. It has to be enabled to use this function of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organizers at isprs2026@icsevents.com.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Agenda Overview

Session

WG III/1A: Remote Sensing Data Processing and Understanding

Time:

Monday, 06-July-2026:

8:30am - 10:00am

Location: 713A

125 theatre

Session Topics:

Remote Sensing Data Processing and Understanding (WG III/1)

External Resource: http://www.commission3.isprs.org/wg1

Presentations

8:30am - 8:45am

Cube Kernel: A Novel Approach to Enable Local Gradient Flow Across Channels in CNNs

Zhimeng HE, Yuwei Cai, Meiliu Wu, Xinyan Xian, Brian Barrett

University of Glasgow, United Kingdom

Understanding inter-band and cross-channel relationships is essential for human color perception and object recognition. Yet, local gradients in standard convolutions are tied to fixed input–output channel pairs, and thus channels are fused by a dense, fully-coupled weight tensor: each output channel aggregates all input channels in a uniform way at every spatial location. This leads to heavy computation and does not exploit structured sparsity or selective local channel mixing. To overcome this limitation, we introduce Cube Kernel, a novel convolutional operator that introduces structured cross-channel groups into the local gradient. This design strengthens cross-channel feature fusion, improves optimization efficiency, and reduces computational overhead. Extensive building extraction experiments validate its effectiveness: Cube Kernel consistently outperforms standard convolutions and Involution when integrated into UNet, and replacing a single layer in DeepLabV3+, Swin-UNet, or UNet leads to consistent performance gains. Beyond serving as a lightweight plug-in module, Cube Kernel also scales effectively as a fundamental building block. A Cube-enhanced ConvNeXt variant, ConvNeXt-Cube, achieves state-of-the-art performance across all models (0.9095 IoU / 0.9535 F1 on WBD and 0.9133 IoU / 0.9547 F1 on WHU), demonstrating strong stackability and architectural potential. These results highlight a largely overlooked space in CNN design: enhancing cross-channel interaction at the gradient level. Cube Kernel offers a scalable and efficient alternative to deepen networks for channel mixing, laying a foundation for future advancements in convolutional architecture design.

8:45am - 9:00am

Land Surface Dynamics Modeling and Prediction with dual Latent-Space Representations

Keli Shi^1,2, Zheng Zhang¹, Liang Tang³, Wenhe Xu⁴, Xiaojun Shan¹, Ping Tang¹

¹Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China; ²School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing, China; ³Yazhou Bay Innovation Institute, Hainan Tropical Ocean University, Sanya, China; ⁴The University of Hong Kong, Hong Kong, China

Modeling land surface dynamics from satellite observations is crucial for revealing change patterns and predicting future states, although effective modeling methods remain limited.

For complex systems such as reaction-diffusion, two approaches have proven particularly effective:

(i) Direct modeling in the high-dimensional observation space with deep networks(e.g., (Wang et al., 2022)). These methods are often autoregressive. Errors accumulate during rolling extrapolation.

(ii) Modeling in a reduced-dimensional latent space(e.g., (Chen et al., 2022)). One reduces dimension and then learns the evolution. Some works estimate the intrinsic dimension (ID) and model in the ID latent space.

This improves long-term stability, but reliance on latent representations may reduce accuracy.

This route is promising if two issues are addressed:

(1) effectively modeling multi-scale spatiotemporal data with long sampling intervals;

(2) combining ID-space modeling with other latent dimensions to balance accuracy and stability.

This paper proposes a Dual Latent-Space Representation-based Land Surface Dynamic Model （DLS-LSDM）. The core contributions are:

(1) a stacked-convolution and multi-scale linear-attention autoencoder to obtain a base latent, together with ID estimation to derive an ID latent;

(2) a long-horizon scheme that combines ID and base latents to achieve both stability and high accuracy ;

(3) comprehensive evaluation on ten-year MODIS NDVI across multiple climate zones, demonstrating superiority.

9:00am - 9:15am

Revealing Feature Contribution Mechanisms for Interpretable CNN-Transformer Remote Sensing Classification

He Chen¹, Xianwei Zheng¹, Wei He¹, Jiansi Yang¹, Linwei Yue², Ting Hu³, Jianya gong¹

¹Wuhan university; ²China University of Geosciences; ³Nanjing University of Information Science and Technology

Deep learning models have become the backbone of remote sensing image intelligent classification, enabling high-precision recognition of land cover, geospatial objects, and scene categories. However, their inherent "black-box" nature—where decision logic is embedded in complex parameter spaces—poses critical barriers to deployment in high-stakes domains such as military reconnaissance, disaster monitoring, and environmental governance. These fields demand transparent reasoning to validate model reliability, yet traditional interpretability methods suffer from two key limitations when applied to remote sensing data: They are primarily designed for natural images, failing to account for remote sensing-specific characteristics. They focus on local feature attribution or saliency mapping but lack quantitative analysis of how core image features (shape, texture, spectrum) contribute to global classification decisions, especially across different network architectures.To address these problem, this study proposes a comprehensive feature contribution analysis framework tailored to remote sensing images, with the core objectives of: (1) Decoupling and extracting shape, texture, and spectrum features from remote sensing images in a physically meaningful manner; (2) Quantifying the contribution of each feature type to classification decisions; (3) Revealing differences in feature processing mechanisms between CNN and Transformer architectures.

9:15am - 9:30am

EfficientViM-CD: An Efficient Remote Sensing Change Detection Network Based on Hidden State-Mixer

haiming zhang¹, hongyang fan²

¹State Key Laboratory of Information Engineering in Surveying , Mapping and Remote Sensing, wuhan university, China, People's Republic of; ²School of Information Science and Engineering, Wuchang Shouyi University

High-resolution optical remote sensing change detection (CD) is of great significance in urban evolution monitoring, disaster assessment, and land management. Traditional deep models often face computational, memory, and inference latency bottlenecks when processing large high-resolution imagery. To address this, we propose EfficientViM-CD: a Hidden-State Mixer based efficient remote sensing change detection network. The approach builds upon the EfficientViM backbone, migrating global interaction operations into a compact hidden state space and leveraging Hidden State Mixer based on state space duality (HSM-SSD) to fuse global context while reducing computational complexity. We employ a Siamese encoding architecture to extract multi-scale features and hidden states from paired temporal images, and utilize a Cross-Hidden Fusion module to integrate hidden semantic interactions between time points. At each scale, local difference features are computed and enhanced in hidden state space, and a multi-scale decoder reconstructs a pixel-level change probability map. We conducted experiments on four public datasets (LEVIR-CD+, WHU-CD, S2Looking, SVCD) and compared against nine state-of-the-art methods. Results demonstrate that EfficientViM-CD achieves competitive accuracy while delivering significant advantages in inference speed and memory efficiency. This method offers a lightweight, efficient, and scalable solution for high-resolution remote sensing change detection, with potential for real-time monitoring and emergency response systems.

9:30am - 9:45am

Local NMS: Enhancing Object Detection in Large-Scale Remote Sensing Images via iterative pipelined Postprocessing

Bettina Felten, Wolfgang Gross, Andreas Michel

Fraunhofer IOSB, Germany

Object detection in large, dense remote sensing imagery is difficult because targets are often small and arbitrarily oriented, and state-of-the-art detectors cannot process very large images directly without a reduction in accuracy. Tiling-based inference workflows mitigate the latter issue by running inference iteratively on overlapping tiles, but introduce pre- and postprocessing overhead for image tiling and Non-Maximum Suppression (NMS). We introduce local NMS, an asynchronous tile-wise postprocessing scheme. Local NMS runs in a separate subprocess in parallel to tile-wise inference and collects intermediate results enqueued by the inference process, immediately applying postprocessing. Intelligent reordering of tiles in a preprocessing step ensures optimal usage of computing resources. We assess our method using three state-of-the art object detection models for horizontal and oriented bounding box detection on two benchmark datasets containing large dense aerial and satellite images, DOTA-v2.0 and Izembek Lagoon Birds, stratifying by image size and average object density. Local NMS consistently reduces end-to-end runtime across models and datasets without significant impact on mAP. A maximum runtime reduction of 60.77% on large dense DOTA-v2.0 scenes could be achieved without modifying model architectures or retraining.

9:45am - 10:00am

ERD: Extended RAW-Diffusion Framework for De-rendering sRGB Images

Jiaqi Shang¹, Yifan Qu¹, Jianbo Qi²

¹Department of Computer Science, University of Toronto, Canada; ²Faculty of Geographical Science, Beijing Normal University, China

Recovering RAW sensor measurements from rendered sRGB images is important for radiometric calibration, low-level vision, and computational photography. However, reversing a camera’s proprietary Image Signal Pipeline (ISP) is highly challenging, especially when the ISP is unknown. Existing inverse-ISP and diffusion-based approaches have several issues: they depend on known ISPs from the sensor, require one model per sensor, or generalize poorly across camera brands.

This work presents ERD (Extended RAW-Diffusion), a unified diffusion-model framework for de-rendering sRGB images into RAW format for any given image, and does not require ISP to be known or camera information from the image. ERD extends the RAW-Diffusion architecture by incorporating camera metadata only during training, allowing the model to learn a shared representation across heterogeneous sensors. To capture global sensor characteristics, ERD introduces a conditioning mechanism, Feature-wise Linear Modulation (FiLM) for global features such as CFA patterns and color gains. To enhance structural consistency, ERD integrates a ControlNet branch that injects edge and gradient priors derived from the sRGB input, stabilizing RAW reconstruction under diverse tone-mapping operations. For practical adaptation to new sensors, ERD supports efficient few-shot tuning via LoRA.

Evaluations on Adobe FiveK (Nikon and Canon) and RAW-NOD (Nikon and Sony) show that ERD outperforms state-of-the-art baselines in PSNR and SSIM, offering improved robustness to unseen camera models. ERD enables a practical, general-purpose inverse ISP process across heterogeneous imaging devices.