 the effect of vanishing and exploding gradients is much worse in RNNs than it is for traditional deep neural networks. This is because DNNs have different weighted matrices between layers, so if the weights between the first two layers are greater than one, then the next layer can have matrix weights which are less than one, and so their effects would cancel each other out. But in the case of RNNs, the same weight parameter recurs between different recurrent units, so it's more of a problem because we cannot cancel it out.