 This paper proposes a novel approach to decoding mental imagery using a transformer-based network. It uses multi-head attention over three dimensions, spatial, spectral, and temporal, to improve the performance of the network. This method was tested on three datasets and achieved high-accuracy scores. This article was offered by Hyung Joon, Dae Hyuk Lee, Ji Hoon Jong, and others.