How do you take millions of users with 100's transactions each, amongst 10k's of products and group the users together in meaningful segments?
How do you take millions of users with 100's transactions each, amongst 10k's of products and group the users together in meaningful segments?
1. Some exploratory data analysis (get a first insight)
-Transactions by date
-Count of customers Vs number of items bought
-Total items Vs total basket per customer
-Total items Vs total basket per area
2. Create new features (per customer):
Counts:
-Total baskets (unique days)
-Total items
-Total spent
-Unique product id
Distributions:
-Items per basket
-Spent per basket
-Product id per basket
-Duration between visits
-Product preferences: proportion of items per product cat per basket
3. Too many features, dimension-reduction? PCA?
4. Clustering:
-PCA
5. Interpreting model fit
-View the clustering by principal component axis pairs PC1 Vs PC2, PC2 Vs PC1.
-Interpret each principal component regarding the linear combination it's obtained from; example: PC1=spendy axis (proportion of baskets containing spendy items, raw counts of items and visits)