 This paper proposes a new approach for remote sensing image scene classification called CNN-CapsNet. It combines the advantages of convolutional neural networks, CNN, and capsule networks, CapsNet. First, a CNN is used as an initial feature map extractor. This CNN was pre-trained on the ImageNet dataset and is then used to extract features from the input image. Next, the extracted features are passed through a CapsNet to further enhance the classification accuracy. The CapsNet is able to capture the spatial information of the features, while also preserving their original characteristics. Finally, the output of the CapsNet is fed into a softmax layer to produce the final classification result. Experimental results show that the proposed CNN-CapsNet architecture outperforms other state-of-the-art methods on three publicly available remote sensing image datasets. This article was authored by Wei Zhong, Ping Tang, and Li Zhenzhao.