What are a few ways you can handle missing values for a feature in your data?
What are a few ways you can handle missing values for a feature in your data?
Answer: You can drop the data instance if there aren't a lot of them. Yout can replace with the mean variable in many instances. In a time series problem you might use a neighboring value. You could do some type of clustering algorithm and then use the average value for instances in the same cluster.
EX: There are two columns in a health dataset, 'current weight' and 'heaviest weight ever'. There are values for every instance of the current weight, but many missing values for heaviest weight. One way to handle missing values is looking at the average percentage difference between heaviest weight and current weight and then applying that percentage difference to calculate the heaviest weight for missing values.