 Our model proposes a biologically plausible plasticity rule that learns a predictive map of the environment using a spiking neural network. This map is then connected to reinforcement learning algorithms, allowing us to explore behavioral time scales while still maintaining a time scale of melisiacans. Our framework also shows how biological parameters such as dwelling times at states, neuronal firing rates and neuromodulation relate to the delay discounting parameter of the TD algorithm and how they influence the learned representation. Additionally, our model suggests a role for replays in both aiding learning in novel environments and finding shortcut trajectories that were not experienced during behavior in agreement with experimental data. This article was authored by Jacopo Bono, Sarzanone, Victor Pedrosa, and others.