 What makes this conceptually so much more appealing than some LSTM cell is that we can physically see a separation in tasks. The encoder learns what is English, what is grammar, and more importantly, what is context. The decoder learns how do English words relate to French words. Both of these, even separately, have some underlying understanding of language. And it's because of this understanding that we can pick apart this architecture and build systems that understand language.