What are a few ways you can handle missing values for a feature in your data?

What are a few ways you can handle missing values for a feature in your data?



Answer: You can drop the data instance if there aren't a lot of them. Yout can replace with the mean variable in many instances. In a time series problem you might use a neighboring value. You could do some type of clustering algorithm and then use the average value for instances in the same cluster.

EX: There are two columns in a health dataset, 'current weight' and 'heaviest weight ever'. There are values for every instance of the current weight, but many missing values for heaviest weight. One way to handle missing values is looking at the average percentage difference between heaviest weight and current weight and then applying that percentage difference to calculate the heaviest weight for missing values.

Popular posts from this blog

After analyzing the model, your manager has informed that your regression model is suffering from multicollinearity. How would you check if he's true? Without losing any information, can you still build a better model?

Is rotation necessary in PCA? If yes, Why? What will happen if you don't rotate the components?

What does Latency mean?