Capstone project in Udacity’s nanodegree
In this project, a Starbucks data set is given which contains simulated data from Starbucks rewards mobile app. This data shows customer behavior on how they respond to offers. Offers can be an actual offer like BOGO ( buy one get one free ) or just an advertisement.
Lets understand the dataset”
We have three json files:
this file explains the offers characteristics, its duration and amount he/she needs to spend to complete the offer
- id (string) — offer id
- offer_type (string) — the type of offer ie BOGO, discount, informational
- difficulty (int) — the minimum required to spend to complete an offer
- reward (int) — the reward is given for completing an offer
- duration (int) — time for the offer to be open, in days
- channels (list of strings)
this file has user details such as age, gender, income and when they created account in the Starbucks rewards mobile application
- age (int) — age of the customer
- became_member_on (int) — the date when customer created an app account
- gender (str) — gender of the customer (note some entries contain ‘O’ for other rather than M or F)
- id (str) — customer-id
- income (float) — customer’s income
this file has customer purchases data, like when they received the offer, when they purchased and used it. An offer is considered successfull only if they view the offer and meets or exceeds the offer amount within the duration specified.
- event (str) — record description (ie transaction, offer received, offer viewed, etc.)
- person (str) — customer-id
- time (int) — time in hours since the start of the test. The data begins at time t=0
- value — (dict of strings) — either an offer id or transaction amount depending on the record
Data Cleaning, Exploration and Modelling:
- Portfolio dataset:
For channel and offer type columns I have performed one-hot encoding. And I have converted duration from days to hours.
2. Profile dataset:
I noticed a strange age value 118 and I believe it should be missing value. So replaced it with NAN. I have dropped all missing values from the dataset and finally I have added a column which shows number of days the user is member in this app.
3. Transcript dataset:
In this analysis, I will be focusing on offer completion only so I have filtered out all transaction events and extracted the offer id from the value column.
There are chances that an user might receive an offer, never view the offer but actually complete the offer. In this case, this customer was not influenced by the offer as he hasn’t actually viewed the offer. So we might have to skip these patterns in our model.
We have to check for pattern like offer received, viewed and then completed in ‘event’ field.
I have used a function create_user_item_matrix which loops through the offers and searches for this pattern received, viewed and then completed. And one is added in the matrix for this pattern else 0 is added.
This function creates a sparse matrix with some NAN ( users who have not received the offer ) and with 0s and 1s.
I used FunkSVD to split matrix into user matrix, latency feature and offer matrix. Normal SVD will not work as we have missing values.
We can split the data set into training and test dataset at a particular time. So we can see if our model is working perfectly or not by using previous values as training and subsequent values as data set.
Mean Squared Error:
I have used mean squared error to evaluate the iterations of FunkSVD. For each user, if we send an offer, then the error is calculated as actual minus dot product of user and offer latent features, then all the square errors for the matrix are summed up. I have tested this for 3 different latent features 5, 10 and 15.
For 5 latent features, we got mean squared error value 0.020964
For 10latent features, we got mean squared error value 0.004630
For 15 latent features, we got mean squared error value 0.002173
Its evident that the 15 latent features has the least MSE.
The above results can again be tested with the help of the test data set.
During testing we can again see the MSE is least when there are 15 latent features.
I have built recommendation function which loops over based on the prediction reaction function for a particular user and gives the offer which has highest score.
Try recommendation for User 1:
From the train dataset I picked up this user and tried to run the prediction model.
Results suggest that offer 6 and offer BOGO within 7 days works good for this user.
Try recommendation for User 2:
I picked up another user and ran the model.
From below results, we can again offer 6 discount is the best for this user.
Try recommendation for new user:
Whenever a new user tries the app, its best to recommend the top or the best sale offer that's used by all the users.
Here we assume that the offer that is most popular among all users might be liked by the new user.
Dataset, notebook file with all analysis and all other details can be found in this git repository.
In this article, I have explored the starbucks data set, did data modelling and prepared a recommendation model for different users.
We can see buy 10 dollars get 2 dollars off within 10 days offer via email, web, mobile and social and social media performs best.
And we saw recommendation for few test users as well.
We faced the cold start problem in this model, where for new users we did not have a way to recommend. I have used rank based recommendation where the overall ranking is used to recommend for new users.
One improvement we can do to the model is splitting the train and test data set based on the offer so that we can see exactly if a user has completed the offer or not. But this will take more time to process and split the data set and need to set up complex algorithm too.