 RNNs have some problems. RNNs are slow. So slow that we use a truncated version of back propagation to train it, and even that's too hardware intense. And also, they can't deal with long sequences very well. We get gradients that vanish and explode if the network is too long. In comes LSTM networks in 1991 that introduced a long short term memory cell in place of dumb neurons. This cell has a branch that allows past information to skip a lot of the processing of the current cell and move on to the next. This allows the memory to be retained for longer sequences.