 This paper proposes a novel two-stage approach for enhancing and super-resolving remote sensing images. The first stage uses a vision transformer, VIT, to increase the resolution of the image. The second stage employs an iterative diffusion model, DM, to further improve the quality of the image. The VIT helps generate global and contextual details while the DM helps to enhance the image quality and generate consistent and harmonious fine details. This two-stage approach outperforms existing state-of-the-art methods on super-resolution of remote sensing images. This article was authored by Ana Semali, Bilal Benjdeira, Anise Kuba, and others.