The GRU is a variant of the LSTM (Long Short Term Memory). It retains the LSTM’s resistance to the vanishing gradient problem, but because of its more straightforward internal structure, it is faster to train. Instead of the input, forget, and output gates in the LSTM cell, the GRU cell has only two gates, an update gate z, and a reset gate r. The update gate defines how much previous memory to keep, and the reset gate represents how to consolidate the new input with the previous memory.