I noticed that there is no standard component for Binning in SAP PA although Binning is required at many places for doing analysis.
In several statistical analyses there is a need for having categorical variables rather than continuous variables. Especially in credit scoring models continuous variables are often transformed into categorical variables for better analysis. Also in case of big data analysis is faster if we use categorical variables as opposed to continuous ones. The process of converting continuous variables into categorical variables is called Binning. In simpler words Binning is a way to group a number of more or less continuous values into a smaller number of "bins". For e.g. , if you have data about a group of people, you might want to arrange their ages into a smaller number of age intervals. The component below allows you to bin a continuous variable into n equally sized (by number of observations) bins.
I have taken a sample data of a credit card client. They have assigned scores to customers based on credit limit and now want to classify the number of customers based on the score range that they fall in. Based on the range I went ahead and created a Mosaic plot. We can further use this data in our predictive algorithms to predict the score range of a new customer based on other variables. One can modify the code as per their need:
Setting up the component:
Column to be Categorized: Give the continuous variable that you want to convert to a categorical variable. Needs to be numeric.
Number of Categories: The number of categories that will be created for the continuous variable above.
Output:
As seen below the variable (Score) is now categorized into 4 different categories of equal distribution.
CODE:
mymain <- function (mydata, BinColumnStr, numBrks)
{
## Package Required for Creating Mosaic Plot
library(vcd)
## Capturing the column that needs to be categorized
mycolumn <- mydata[,BinColumnStr]
## Creating the Categories
mydata$Category<- cut(mycolumn, breaks=as.numeric(numBrks), include.lowest=TRUE)
## Tabulating the categories for Mosaic Plot
output1 <- xtabs(Count~Region+Category, data=mydata)
## Aggregating the count based on Region & Category
myaggregation<- aggregate(Count ~ Region+Category, data=mydata, FUN=sum)
output <- data.frame(myaggregation$Region, myaggregation$Category, myaggregation$Count)
## Creating Mosaic Plot
mosaic(output1, shade=TRUE)
return(list(out=output))
}
Please put your comments if there is anything I can add to this code.