 This paper proposes a novel hybrid CNN transformer architecture called CTCANET for high resolution by temporal remote sensing image change detection. It combines the strengths of convolutional networks, transformer, and attention mechanisms to extract high-level feature representations from the images while also taking advantage of the complementary characteristics of the two models. Specifically, the transformer module is used to capture global spatial temporal context in token space, while the convolutional block attention module is employed to smooth semantic gaps between heterogeneous features and enhance the accuracy of change detection. Experiments show that the proposed CTCANET achieves better performance than other state-of-the-art methods on two publicly available datasets. This article was authored by Meng Mengyin, Ji Buchen, and Qingjian Zhang.