Quantcast
Channel: SCN : All Content - SAP BusinessObjects Predictive Analytics
Viewing all articles
Browse latest Browse all 836

Custom R Components - Classification with the Naive Bayes Algorithm

$
0
0

The Naive Bayes algorithm is one (of many) methods of Classification. For instance you may want to derive from a past Marketing campaign what prospects you should focus on in your next Marketing activity. The algorithm can identify patterns of what type of contacts have already purchased a certain product (ie what was their age, gender, income, etc.). Now you can use this information for your next campaign and focus on the people that are most likely to be interested. So you spend your Marketing budget where it is most effective.

 

SAP Predictive Analysis can use the Naive Bayes algorithm thanks to the ability to create Custom R Components. Within such a component an expert user can encapsulate R-Script in an end-user-friendly format. With thousands of different methods available in R, that concept is extremely powerful. This article explains how to implement and use Naive Bayes.

 

Usage

Let's try the Naive Bayes algorithm on some data from the real world. The UC Irvine Machine Learning Repository kindly hosts a dataset with information taken from the 1994 US Census. The file called Adult contains anonymous information from over 32.000 people listing their age, education, martical status and much more, including the information whether the person was earning over 50.000 US Dollar in the year 1994. We will use this information to create a model that we can apply on future data to determine if the person is likely to earn more or less than these 50.000 USD.

 

You can follow the steps below if you download the above dataset. Before getting started, you may just have to add a first row with column names.

 

Just load your data into SAP Predictive Analysis. You see some of the available columns. The 'Income' field on the right-hand side tells us whether the person was in that year over or below the 50k threshold.

1.PNG

 

Now add the Naive Bayes Classifier component to my model. Further below you find the details to add this logic to your own SAP Predictive Analysis installation.

2.PNG

Configure the component. You need to tell the component

- the Classifier Column: Income

- and the Predictor Column: Here you can pick Age, Occupation and HoursPerWeek to start.

3.PNG

 

Run the model. Then go to the charts area. The table shows how many records were correctly and incorrectly classified. 24.263 people were correctly classified as earning less than 50.000 USD. 556 people were correctly classified as high-earners.

4.PNG

 

You can also save the trained model to further test it on data that is already classified. Or you can apply the model on new data for which the classification is actually unknown.

5.PNG

 

R Libraries

Please make sure you have the R-libraries e1071 and gplots installed. The following document explains how to make new libraries available in SAP Predictive Analysis:

http://scn.sap.com/docs/DOC-28396

 

You many want to read the documentation of the Naive Bayes algorithm on:

http://ugrad.stat.ubc.ca/R/library/e1071/html/predict.naiveBayes.html

 

R Code

mymain <- function (mydata, myPredictorColumnsList, myClassifierColumnStr)

{

## Load library for the Naive Bayes algorithm

library(e1071)

 

## Load library to display the output as table in the chart panel

library(gplots)

 

## Assign the first predictor column to a string that will concatenates all predictor columns.

myPredictorColumnsConcat <-  myPredictorColumnsList[[1]]

 

## If more independet columns were selected, add these to the concat string, seperated by a '+'.

if (length(myPredictorColumnsList) > 1 )

  {

  for (i in 2:length(myPredictorColumnsList))

    {

    myPredictorColumnsConcat <- paste(myPredictorColumnsConcat, myPredictorColumnsList[[i]], sep=' + ')

    }

  }

 

## Create the R command that will create the Naive Bayes model.

myRCommandStr <-paste('naiveBayes(as.factor(' , myClassifierColumnStr, ') ~ ', myPredictorColumnsConcat, ', data=mydata)')

 

## Create the model by parsing the above R syntax.

myModel <- eval(parse(text=myRCommandStr))

 

## Apply the model on the current data to test its accuracy.

myPrediction <-predict(myModel, newdata=mydata)

 

## Display the prediction as table.

textplot(capture.output(table(myPrediction, mydata[,myClassifierColumnStr])), halign="left", valign="top", cex=1)

 

## Return the input data as output together with the predicted values.

output <- cbind(mydata, myPrediction)

return(list(mytrainedmodel=myModel, out=output)) 

}

 

mypredict<-function(mynewdata, mytrainedmodel)

{

## Carry out the prediction on previously unseen data.

## The column names have to match the column names of the mytrainedmodel.

myprediction <- predict(mytrainedmodel, newdata=mynewdata)

 

## Return the input data as output together with the predicted values.

output <- cbind(mynewdata, myprediction)

return(list(out=output))

}

 

Configuration

c1.PNG

c2.PNG

c3.PNG

c4.PNG

 

Some links to get you started with Custom R Components in SAP Predictive Analysis

 

Creating and Using Custom R Components
http://scn.sap.com/docs/DOC-42862

 

Tips to use Custom R scripts
http://scn.sap.com/docs/DOC-42863

 

Hands-On Tutorial for creating Custom R Components
http://scn.sap.com/docs/DOC-42739


Viewing all articles
Browse latest Browse all 836

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>