How do you take millions of users with 100's transactions each, amongst 10k's of products and group the users together in meaningful segments?

How do you take millions of users with 100's transactions each, amongst 10k's of products and group the users together in meaningful segments?



1. Some exploratory data analysis (get a first insight)

-Transactions by date
-Count of customers Vs number of items bought
-Total items Vs total basket per customer
-Total items Vs total basket per area

2. Create new features (per customer):

Counts:

-Total baskets (unique days)
-Total items
-Total spent
-Unique product id

Distributions:

-Items per basket
-Spent per basket
-Product id per basket
-Duration between visits
-Product preferences: proportion of items per product cat per basket

3. Too many features, dimension-reduction? PCA?

4. Clustering:

-PCA

5. Interpreting model fit
-View the clustering by principal component axis pairs PC1 Vs PC2, PC2 Vs PC1.
-Interpret each principal component regarding the linear combination it's obtained from; example: PC1=spendy axis (proportion of baskets containing spendy items, raw counts of items and visits)

Popular posts from this blog

After analyzing the model, your manager has informed that your regression model is suffering from multicollinearity. How would you check if he's true? Without losing any information, can you still build a better model?

Is rotation necessary in PCA? If yes, Why? What will happen if you don't rotate the components?

What does Latency mean?