Google Interview Hacks

Posts

Showing posts with the label Deep Learning ANN

Explain how does a perceptron work.

August 13, 2018

Explain how does a perceptron work. Inputs are summed and weighted as seen in the picture. Sum is then passed to unit function. Perceptron can only learn simple functions from examples.

What gives the ability to learn complex functions in deep nets?

August 13, 2018

What gives the ability to learn complex functions in deep nets? Non-linear behavior of an activation function.

Why is the Rectified Linear Unit (ReLu) useful?

August 13, 2018

Why is the Rectified Linear Unit (ReLu) useful? ReLu can let big numbers pass through, making a few neurons stale and they don't fire. This increases sparsity, which is good. ReLu maps input x to (0, X), map negative inputs to 0. Since it doesn't fire all the time it can be trained faster. Function is simple so computationally the least expensive. Works well for a large amount of applications. But we can stack several perceptrons to learn more complex functions.

What determines the values of the weights and biases?

August 13, 2018

What determines the values of the weights and biases? Training process determines the values of the weights and biases. Model values usually set randomly at the beginning.

How is the error computed?

August 13, 2018

How is the error computed? Error is computed using a loss function by contrasting it with the ground truth. Based on the loss of every step, weights are tuned at every step.

What does the Universal Approximation Theorem state?

August 13, 2018

What does the Universal Approximation Theorem state? States that a multi-layer perceptron can approximate any function.

How many activation functions per layer in an ANN?

August 13, 2018

How many activation functions per layer in an ANN? 1

How many perceptrons and hidden layers are needed?

August 13, 2018

How many perceptrons and hidden layers are needed? Depends on the problem.

What is a multi-class classification problem?

August 13, 2018

What is a multi-class classification problem? Tries to discriminate more than 10 categories. Becomes possible with one-hot encoding and softmax function.

What is one-hot encoding and why is it useful?

August 13, 2018

What is one-hot encoding and why is it useful? Way to represent the target variables in classification problems. Target variables are converted from string values ("dog") to one-hot encoded vectors. One-hot encoded vector is filled with 1 at the index at the target class and 0 everywhere else. Makes no assumption of similarity of target variables. Makes multi-class classification possible with softmax.

What is softmax and why is it useful?

August 13, 2018

What is softmax and why is it useful? Softmax is a way of forcing the neural Network to sum to 1 and is an activation function with specialty out summing to 1. This creates output values that can be considered as a part of a nice probability distribution. Useful in multi-class classification. Converts output to 1 by dividing the output by summation of all other values.

What is cross entropy and why is it useful?

August 13, 2018

What is cross entropy and why is it useful? Cross Entropy is a loss function in which error has to be minimized. Cross Entropy is the summation of negative logarithmic probabilities, logarithmic values are used for numerical stability. Cross Entropy compares the distance between the outputs of softmax and one-hot encoding. Neural Networks estimate the probability of the given data for every class. Probability has to be maximized to the correct target label.

What are some regularization functions to reduce overfitting?

August 13, 2018

What are some regularization functions to reduce overfitting? 1. Dropout 2. Batch normalization 3. L1 & L2 normalization

What is dropout and why is it important?

August 13, 2018

What is dropout and why is it important? Dropout is an effective way of regularizing neural networks to prevent overfitting of the ANN. During training, the dropout layer cripples the neural Network by removing hidden units stochastically as shown in the image. Dropout is also an efficient way of combining several neural networks.

What is an extreme case of bagging and model averaging?

August 13, 2018

What is an extreme case of bagging and model averaging? Dropout. For each training case, we randomly select a few hidden units so we end up with a different architecture for each case. Should not be used as an inference as it's not necessary.

What is batch normalization and why is it important?

August 13, 2018

What is batch normalization and why is it important? Batch-norm increase stability and performance of neural network training. Normalizes the output from a layer with Zero mean and standard deviation of 1. Important because it reduces overfitting and makes the network train faster. Very useful for complex neural networks.

What is L1 & L2 Regularization? Why is it useful?

August 13, 2018

What is L1 & L2 Regularization? Why is it useful? L1 regularization penalizes the absolute value of the weight and tends to make it zero. L2 regularization penalizes the squared value of the weight and tends to make it smaller during training. Both of the regularizations assume that models with smaller weights are better.

Search This Blog

Google Interview Hacks

Posts

Explain how does a perceptron work.

What gives the ability to learn complex functions in deep nets?

Why is the Rectified Linear Unit (ReLu) useful?

What determines the values of the weights and biases?

How is the error computed?

What does the Universal Approximation Theorem state?

How many activation functions per layer in an ANN?

How many perceptrons and hidden layers are needed?

What is a multi-class classification problem?

What is one-hot encoding and why is it useful?

What is softmax and why is it useful?

What is cross entropy and why is it useful?

What are some regularization functions to reduce overfitting?

What is dropout and why is it important?

What is an extreme case of bagging and model averaging?

What is batch normalization and why is it important?

What is L1 & L2 Regularization? Why is it useful?

What does the Learning rate determine in Gradient Descent?

What is Stochastic Gradient Descent (SGD) and why is it important?

What's a tool to visualize neural networks?