What is Stochastic Gradient Descent (SGD) and why is it important?
What is Stochastic Gradient Descent (SGD) and why is it important?
Same as Gradient Descent but uses partial to train each time. Parameter is mini-batch size. Theoretically, even one sample can be used for training.