Clearing Up Confusion: The Cost-Benefit of Using Classification Models in Machine Learning
Introduction
Machine learning is one of the hottest terms being used when trying to sell analytic packages and services. Sure it's easy to believe that it is an effective method when it's one of the most talked about things in tech, but how do we prove to stakeholders that it is worth the time, effort, and money to create an ML model when descriptive statistics can give them insights into their market? Here we can take a look at the confusion matrix results in a classification model to determine the cost benefit.
Background/Problem
A classification model can use input data to determine a category to assign to the record. Some use cases of these models would be spam filters (is the email spam or not spam), cyber threat detection (threat or not threat), direct marketing (will a customer complete a purchase), and many others. We won’t go into the details of creating a classification model here but instead look at the cost benefit analysis that can be done from its results.
Some terms to know:
Training data – data that’s been collected to build your model off of.
Hold out data – data that you’ve collected that you purposely left out of the testing and training of your model. This data can then be inputted into your model to assess the accuracy of your model.
Solution/Methodology
After creating a classification model, you can test its accuracy against your holdout data by comparing how many cases it correctly predicted a classification. The results of your prediction vs the actual results can be displayed into a 2x2 matrix known as the confusion matrix. These are often used to calculate the accuracy of your model.
Actual Result | |||
Positive | Negative | ||
Model Prediction | Positive | True Positive (TP) | False Positive (FO) |
Negative | False Negative (FN) |
True Negative (TN) |
Knowing the accuracy of a model is great when it comes to model selection but how can you turn it into something that is meaningful to key stakeholders? Turn it into $$$$ of course. The matrix below shows what each predict vs actual result means in terms of dollars.
Actual Result | |||
Positive | Negative | ||
Model Prediction | Positive | Gain | Loss |
Negative | Missed Gain |
Case Study/Examples
Now let's bring this together as an example. I moved from NY to Denver three years ago and while I’ve embraced the breakfast burrito here, I still miss a good NY style bagel. Considering myself a bagel expert, I decided to start making my own. Since we live in the world of the side-hustle, I decided I should start selling these bagels. I need to build up a customer base so I start randomly distributing a buy one get one free bagel coupon.
Being the avid data nerd that I am, I start collecting all sorts of information around my sales and customers (with their permission of course) when they come in and use the coupon. From there I can create my classification model to identify if target customers will respond to my coupon. Now the question is will this model give me better results in my next coupon campaign than if I continue to blindly send out coupons and should I continue my coupon campaign at all?
Using my holdout data I can see the predicted vs actual results of my model.
Respond to Coupon | Don't Respond to Coupon | ||
Model Prediction | Respond to Coupon | 43 | 5 |
Don't Respond to Coupon | 2 | 50 |
Lets add costs to this now:
It cost me 1 dollar to send out a coupon.
If a person comes in and uses the coupon I make 2 dollars on the sale.
According to my model 48 people will respond to my coupon so I send out coupons to those 48 people meaning my cost of sending coupons will be $48.
My model predicts that these 48 people will use the coupon resulting in earnings of $96 on those sales.
My predicted total profit then becomes $48. Woohoo!
Granted my model is not 100 percent accurate but with the confusion matrix we can easily visualize the gains, losses and missed gains. My model correctly predicted that 43 people would respond to the coupon. Here I had a $43 profit. It incorrectly predicted five target customers would respond to the coupon. You can calculate the loss of $5 dollars for sending coupons. Finally the model misidentified 2 people that would have used the coupon were you to send them one. This could have been $2 dollars of profit had the model classified them correctly. I now can easily see the cost/benefits of my model and do further analysis if desired. Overall it looks like a targeted coupon campaign is a great idea.
Actual Result | |||
Respond to Coupon | Don't Respond to Coupon | ||
Model Prediction | Respond to Coupon | $43 | ($5) |
Don't Respond to Coupon | $4 missed sales |
Now let's follow this example under something known as the Naïve Rule. This method isn’t looking at each target customer individually but instead making an assumption about the whole sample. I could consult an industry expert and from experience they’ve seen that about 50 percent of coupons get responses. I could also take an average of my data and see similar results. Using this fifty percent response estimate, I sent out coupons to my list of 100 recipients.
I am estimating that I will get a fifty percent response rate meaning I will make $100 off of my bagel sales.
However it costs me 100 dollars to send out the coupons meaning I make zero profit.
With this analysis I would decide it's not worth it to send out the coupons.
Benefits/Impact
By shifting the type of analysis to a classification model a business can see further benefits than traditional methods. Here, I realized that having a coupon program can be beneficial when implementing a more targeted approach. If I was looking to expand my business I could present this to investors to show that I have created a superior method of marketing that will result in profit.
Conclusion
It's easy to assume that insights from machine learning are more powerful than traditional methods when implemented correctly. However as a data scientist, it is important to be able to explain why that is the case. Using a confusion matrix is one way to drive a cost analysis when choosing a methodology.
Author Bio
Jen Brousseau is a business analyst at Jahnel Group, Inc., a custom software development firm based in Schenectady, NY. Jahnel Group is a consulting firm that specializes in helping companies leverage technology to improve their business operations. We provide end-to-end strategic consulting and deployment services, helping companies optimize their operations and reduce costs through the use of technology. Jahnel Group is an Advanced Tier AWS Services Partner, with expertise in AWS Lambdas, Amazon API Gateway, Amazon DynamoDB, and other AWS services.