 Our proposed approach utilizes a novel Siamese-based spatial-temporal attention neural network to detect significant changes between bi-temporal images. It incorporates a CD self-attention module to model the spatial-temporal relationships between pixels, allowing for better representation of objects of varying sizes. This enables us to capture spatial-temporal dependencies at multiple scales, resulting in improved performance compared to existing methods. Furthermore, we introduce a large-scale remote-sensing image CD dataset, lever CD, consisting of 637 image pairs, 1024 by 1024, and over 31K independently labeled change instances. The dataset is two orders of magnitude larger than other public datasets of this field, providing a valuable resource for researchers working in this area. This article was authored by Hao Chen, Engine Waysher.