Posts

Showing posts with the label Deep Learning ANN

Explain how does a perceptron work.

Explain how does a perceptron work. Inputs are summed and weighted as seen in the picture. Sum is then passed to unit function. Perceptron can only learn simple functions from examples.

What gives the ability to learn complex functions in deep nets?

What gives the ability to learn complex functions in deep nets? Non-linear behavior of an activation function.

Why is the Rectified Linear Unit (ReLu) useful?

Why is the Rectified Linear Unit (ReLu) useful? ReLu can let big numbers pass through, making a few neurons stale and they don't fire. This increases sparsity, which is good. ReLu maps input x to (0, X), map negative inputs to 0. Since it doesn't fire all the time it can be trained faster. Function is simple so computationally the least expensive. Works well for a large amount of applications. But we can stack several perceptrons to learn more complex functions.

What determines the values of the weights and biases?

What determines the values of the weights and biases? Training process determines the values of the weights and biases. Model values usually set randomly at the beginning.

How is the error computed?

How is the error computed? Error is computed using a loss function by contrasting it with the ground truth. Based on the loss of every step, weights are tuned at every step.

What does the Universal Approximation Theorem state?

What does the Universal Approximation Theorem state? States that a multi-layer perceptron can approximate any function.

How many activation functions per layer in an ANN?

How many activation functions per layer in an ANN? 1

How many perceptrons and hidden layers are needed?

How many perceptrons and hidden layers are needed? Depends on the problem.

What is a multi-class classification problem?

What is a multi-class classification problem? Tries to discriminate more than 10 categories. Becomes possible with one-hot encoding and softmax function.

What is one-hot encoding and why is it useful?

What is one-hot encoding and why is it useful? Way to represent the target variables in classification problems. Target variables are converted from string values ("dog") to one-hot encoded vectors. One-hot encoded vector is filled with 1 at the index at the target class and 0 everywhere else. Makes no assumption of similarity of target variables. Makes multi-class classification possible with softmax.

What is softmax and why is it useful?

What is softmax and why is it useful? Softmax is a way of forcing the neural Network to sum to 1 and is an activation function with specialty out summing to 1. This creates output values that can be considered as a part of a nice probability distribution. Useful in multi-class classification. Converts output to 1 by dividing the output by summation of all other values.

What is cross entropy and why is it useful?

What is cross entropy and why is it useful? Cross Entropy is a loss function in which error has to be minimized. Cross Entropy is the summation of negative logarithmic probabilities, logarithmic values are used for numerical stability. Cross Entropy compares the distance between the outputs of softmax and one-hot encoding. Neural Networks estimate the probability of the given data for every class. Probability has to be maximized to the correct target label.

What are some regularization functions to reduce overfitting?

What are some regularization functions to reduce overfitting? 1. Dropout 2. Batch normalization 3. L1 & L2 normalization

What is dropout and why is it important?

What is dropout and why is it important? Dropout is an effective way of regularizing neural networks to prevent overfitting of the ANN. During training, the dropout layer cripples the neural Network by removing hidden units stochastically as shown in the image. Dropout is also an efficient way of combining several neural networks.

What is an extreme case of bagging and model averaging?

What is an extreme case of bagging and model averaging? Dropout. For each training case, we randomly select a few hidden units so we end up with a different architecture for each case. Should not be used as an inference as it's not necessary.

What is batch normalization and why is it important?

What is batch normalization and why is it important? Batch-norm increase stability and performance of neural network training. Normalizes the output from a layer with Zero mean and standard deviation of 1. Important because it reduces overfitting and makes the network train faster. Very useful for complex neural networks.

What is L1 & L2 Regularization? Why is it useful?

What is L1 & L2 Regularization? Why is it useful? L1 regularization penalizes the absolute value of the weight and tends to make it zero. L2 regularization penalizes the squared value of the weight and tends to make it smaller during training. Both of the regularizations assume that models with smaller weights are better.

What does the Learning rate determine in Gradient Descent?

What does the Learning rate determine in Gradient Descent? How big the steps should be. Nonlinear activations will have local minima, SGD works better in practice for optimizing non-convex functions.

What is Stochastic Gradient Descent (SGD) and why is it important?

What is Stochastic Gradient Descent (SGD) and why is it important? Same as Gradient Descent but uses partial to train each time. Parameter is mini-batch size. Theoretically, even one sample can be used for training.

What's a tool to visualize neural networks?

What's a tool to visualize neural networks? TensorFlow Playground - can change Learning rate, activation functions, regularization, hidden layers to see how it affects training process. Intuition of neural networks.