Adam optimization is an algorithm that can be used to update network weights iteratively based on training data instead of the traditional stochastic gradient descent method. Adam is derived from the calculation of the evolutionary moment. For deep learning, this algorithm is used.
The listed below are the features of using Adam on non-convex optimization issues -
Adam combines the benefits of two other stochastic gradient descent extensions and the Adaptive Gradient Algorithm (AdaGrad), which retains a learning speed per-parameter that improves performance on sparse gradients issues (e.g., natural language issues and computer vision issues). Root Mean Square Propagation (RMSProp) also preserves per-parameters learning rates adjusted to the weight based on the average of recent magnitudes. Offline and non-stationary problems, this algorithm does well.