 The proposed model combines a dual-stage attention mechanism, DA, crisscross gray wolf optimizer, CSGWO, and bi-directional gated recurrent unit, by GRU, to accurately predict short-term load data. This model upperforms other existing models in terms of accuracy, reducing the root mean square error, armacy, mean absolute error, ME, and standardized mean absolute percentage error, SMAPE. This article was authored by Ren-Chi Gong and Shen Lonli.