Capstone project in Udacity’s nanodegree


In this project, a Starbucks data set is given which contains simulated data from Starbucks rewards mobile app. This data shows customer behavior on how they respond to offers. Offers can be an actual offer like BOGO ( buy one get one free ) or just an advertisement.

Lets understand the dataset”

We have three json files:

this file explains the offers characteristics, its duration and amount he/she needs to spend to complete the offer

Portfolio dataset sample records

2. Profile:

this file has user details such as age, gender, income and when they created account in the Starbucks rewards mobile application

Profile dataset sample records

3. Transcript:

this file has customer purchases data, like when they received the offer, when they purchased and used it. An offer is considered successfull only if they view the offer and meets or exceeds the offer amount within the duration specified.

Transcript dataset sample records

Data Cleaning, Exploration and Modelling:

For channel and offer type columns I have performed one-hot encoding. And I have converted duration from days to hours.

Portfolio dataset after one hot encoding

2. Profile dataset:

I noticed a strange age value 118 and I believe it should be missing value. So replaced it with NAN. I have dropped all missing values from the dataset and finally I have added a column which shows number of days the user is member in this app.

Profile dataset after removing NAN and adding member days column

3. Transcript dataset:

In this analysis, I will be focusing on offer completion only so I have filtered out all transaction events and extracted the offer id from the value column.

Transcript dataset after removing non offer records and adding offer id column

4. Modelling:

There are chances that an user might receive an offer, never view the offer but actually complete the offer. In this case, this customer was not influenced by the offer as he hasn’t actually viewed the offer. So we might have to skip these patterns in our model.

We have to check for pattern like offer received, viewed and then completed in ‘event’ field.

I have used a function create_user_item_matrix which loops through the offers and searches for this pattern received, viewed and then completed. And one is added in the matrix for this pattern else 0 is added.

This function creates a sparse matrix with some NAN ( users who have not received the offer ) and with 0s and 1s.

User Item Matrix with NAN, 1.0 and 0 values


I used FunkSVD to split matrix into user matrix, latency feature and offer matrix. Normal SVD will not work as we have missing values.

We can split the data set into training and test dataset at a particular time. So we can see if our model is working perfectly or not by using previous values as training and subsequent values as data set.

Mean Squared Error:

I have used mean squared error to evaluate the iterations of FunkSVD. For each user, if we send an offer, then the error is calculated as actual minus dot product of user and offer latent features, then all the square errors for the matrix are summed up. I have tested this for 3 different latent features 5, 10 and 15.

For 5 latent features, we got mean squared error value 0.020964

For 10latent features, we got mean squared error value 0.004630

For 15 latent features, we got mean squared error value 0.002173

Its evident that the 15 latent features has the least MSE.


The above results can again be tested with the help of the test data set.

During testing we can again see the MSE is least when there are 15 latent features.


I have built recommendation function which loops over based on the prediction reaction function for a particular user and gives the offer which has highest score.

Portfolio Data

Try recommendation for User 1:

From the train dataset I picked up this user and tried to run the prediction model.

Results suggest that offer 6 and offer BOGO within 7 days works good for this user.

User 1 Prediction results

Try recommendation for User 2:

I picked up another user and ran the model.

From below results, we can again offer 6 discount is the best for this user.

User 2 Prediction results

Try recommendation for new user:

Whenever a new user tries the app, its best to recommend the top or the best sale offer that's used by all the users.

Here we assume that the offer that is most popular among all users might be liked by the new user.

Prediction results for a new user

Dataset, notebook file with all analysis and all other details can be found in this git repository.


In this article, I have explored the starbucks data set, did data modelling and prepared a recommendation model for different users.

We can see buy 10 dollars get 2 dollars off within 10 days offer via email, web, mobile and social and social media performs best.

And we saw recommendation for few test users as well.

We faced the cold start problem in this model, where for new users we did not have a way to recommend. I have used rank based recommendation where the overall ranking is used to recommend for new users.

One improvement we can do to the model is splitting the train and test data set based on the offer so that we can see exactly if a user has completed the offer or not. But this will take more time to process and split the data set and need to set up complex algorithm too.

Data science enthusiast, Qlik Architect.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store