The Naive Bayes algorithm is one (of many) methods of Classification. For instance you may want to derive from a past Marketing campaign what prospects you should focus on in your next Marketing activity. The algorithm can identify patterns of what type of contacts have already purchased a certain product (ie what was their age, gender, income, etc.). Now you can use this information for your next campaign and focus on the people that are most likely to be interested. So you spend your Marketing budget where it is most effective.
SAP Predictive Analysis can use the Naive Bayes algorithm thanks to the ability to create Custom R Components. Within such a component an expert user can encapsulate R-Script in an end-user-friendly format. With thousands of different methods available in R, that concept is extremely powerful. This article explains how to implement and use Naive Bayes.
Usage
Let's try the Naive Bayes algorithm on some data from the real world. The UC Irvine Machine Learning Repository kindly hosts a dataset with information taken from the 1994 US Census. The file called Adult contains anonymous information from over 32.000 people listing their age, education, martical status and much more, including the information whether the person was earning over 50.000 US Dollar in the year 1994. We will use this information to create a model that we can apply on future data to determine if the person is likely to earn more or less than these 50.000 USD.
You can follow the steps below if you download the above dataset. Before getting started, you may just have to add a first row with column names.
Just load your data into SAP Predictive Analysis. You see some of the available columns. The 'Income' field on the right-hand side tells us whether the person was in that year over or below the 50k threshold.
Now add the Naive Bayes Classifier component to my model. Further below you find the details to add this logic to your own SAP Predictive Analysis installation.
Configure the component. You need to tell the component
- the Classifier Column: Income
- and the Predictor Column: Here you can pick Age, Occupation and HoursPerWeek to start.
Run the model. Then go to the charts area. The table shows how many records were correctly and incorrectly classified. 24.263 people were correctly classified as earning less than 50.000 USD. 556 people were correctly classified as high-earners.
You can also save the trained model to further test it on data that is already classified. Or you can apply the model on new data for which the classification is actually unknown.
R Libraries
Please make sure you have the R-libraries e1071 and gplots installed. The following document explains how to make new libraries available in SAP Predictive Analysis:
http://scn.sap.com/docs/DOC-28396
You many want to read the documentation of the Naive Bayes algorithm on:
http://ugrad.stat.ubc.ca/R/library/e1071/html/predict.naiveBayes.html
R Code
mymain <- function (mydata, myPredictorColumnsList, myClassifierColumnStr)
{
## Load library for the Naive Bayes algorithm
library(e1071)
## Load library to display the output as table in the chart panel
library(gplots)
## Assign the first predictor column to a string that will concatenates all predictor columns.
myPredictorColumnsConcat <- myPredictorColumnsList[[1]]
## If more independet columns were selected, add these to the concat string, seperated by a '+'.
if (length(myPredictorColumnsList) > 1 )
{
for (i in 2:length(myPredictorColumnsList))
{
myPredictorColumnsConcat <- paste(myPredictorColumnsConcat, myPredictorColumnsList[[i]], sep=' + ')
}
}
## Create the R command that will create the Naive Bayes model.
myRCommandStr <-paste('naiveBayes(as.factor(' , myClassifierColumnStr, ') ~ ', myPredictorColumnsConcat, ', data=mydata)')
## Create the model by parsing the above R syntax.
myModel <- eval(parse(text=myRCommandStr))
## Apply the model on the current data to test its accuracy.
myPrediction <-predict(myModel, newdata=mydata)
## Display the prediction as table.
textplot(capture.output(table(myPrediction, mydata[,myClassifierColumnStr])), halign="left", valign="top", cex=1)
## Return the input data as output together with the predicted values.
output <- cbind(mydata, myPrediction)
return(list(mytrainedmodel=myModel, out=output))
}
mypredict<-function(mynewdata, mytrainedmodel)
{
## Carry out the prediction on previously unseen data.
## The column names have to match the column names of the mytrainedmodel.
myprediction <- predict(mytrainedmodel, newdata=mynewdata)
## Return the input data as output together with the predicted values.
output <- cbind(mynewdata, myprediction)
return(list(out=output))
}
Configuration
Some links to get you started with Custom R Components in SAP Predictive Analysis
Creating and Using Custom R Components
http://scn.sap.com/docs/DOC-42862
Tips to use Custom R scripts
http://scn.sap.com/docs/DOC-42863
Hands-On Tutorial for creating Custom R Components
http://scn.sap.com/docs/DOC-42739