Efficient Logarithmic Step Size for Stochastic Gradient Descent with Warm Restarts
A novel logarithmic step size for stochastic gradient descent (SGD) with warm restarts is proposed, which achieves an optimal convergence rate of O(1/√T) for smooth non-convex functions.