What is Stochastic Gradient Descent?

Stochastic Gradient Descent (SGD), in contrast, updates the parameters using only a single training instance in each iteration. The training instance is usually selected randomly.

The training instance is usually selected randomly.

Stochastic gradient descent is often preferred to optimize cost functions when there are hundreds of thousands of training instances or more, as it will converge more quickly than batch gradient descent. Batch gradient descent is a deterministic algorithm, and will produce the same parameter values given the same training set. As a stochastic algorithm, SGD can produce different parameter estimates each time it is run. SGD may not minimize the cost function as well as gradient descent because it uses only single training instances to update the weights. Its approximation is often close enough, particularly for convex cost functions such as residual sum of squares.

Search This Blog

Google Interview Hacks

What is Stochastic Gradient Descent?

What is Stochastic Gradient Descent?

Popular posts from this blog

Is rotation necessary in PCA? If yes, Why? What will happen if you don't rotate the components?

After analyzing the model, your manager has informed that your regression model is suffering from multicollinearity. How would you check if he's true? Without losing any information, can you still build a better model?

What does Latency mean?