While working on a data set, how do you select important variables? Explain your methods.

While working on a data set, how do you select important variables? Explain your methods.





Answer: Following are the methods of variable selection you can use:


  • Remove the correlated variables prior to selecting important variables
  • Use linear regression and select variables based on p values
  • Use Forward Selection, Backward Selection, Stepwise Selection
  • Use Random Forest, Xgboost and plot variable importance chart
  • Use Lasso Regression
  • Measure information gain for the available set of features and select top n features accordingly.

Popular posts from this blog

After analyzing the model, your manager has informed that your regression model is suffering from multicollinearity. How would you check if he's true? Without losing any information, can you still build a better model?

Is rotation necessary in PCA? If yes, Why? What will happen if you don't rotate the components?

What does Latency mean?