 We developed transformer-based models to classify motor imagery electroencephalogram, e.g., data using Physionet dataset. Our models achieved higher accuracy than existing state-of-the-art models, demonstrating the effectiveness of transformers in this application. Additionally, the attention weights provided insights into the working mechanism of the transformer-based networks during motor imagery tasks. This article was authored by Jean Sia, Jiaxian, Jiao Sun, and others.