Custom R Component - Random Forest Classification

This component adds a Random Forest Classification to SAP Predictive Analysis. A random forest is an ensemble technique, which creates many decisions trees with a random element to achieve a stronger ability to predict. This comes at the cost of the model's reduced interpretability.The classification/target variable in this component can contain two or more levels.

A confusion matrix is automatically shown when training or testing the model. When applying the model on data, for which the actual classificationis not known, a frequency plot of the predicted classification is displayed.

Disclaimer

Please note that this component is provided as-is without any guarantee or support.

Prerequisites

- R libraries randomForest and ggplot2 have to be installed.

- The column names must not include a minus sign.

- To avoid the R error "New factor levels not present in the training data" add an empty "Filter" component right after the datasource in the analytical workflow. This affects how the levels in the datasets are managed.

Limitations

- The algorithm does not support classifiers with more than 32 levels. For instance a country field with more than 32 different countries cannot be used as input variable.

- The test and prediction datasets must contain the same levels of all input parameters as the training dataset. For instance you must not have any new country in the test or training dataset.

Usage

These parameters can be set by the user.

Parameter	Description
Predictor Columns	Names of the predictor columns.
Classifier Column	Name of the target column.
Number of Trees to grow	Number of trees that will be calculated for the random forest. Larger values typically lead to stronger models, but the calculation time will be increased.
Minimum size of terminal nodes	Minimum size of terminal/leaf nodes. Smaller values lead to more complex random forests.

Output Columns added by this Component

Column	Description
PredictedValue	Value predicted by the random forest.

How to Implement

The component is attached to this article. Download and unzip the file. You will see a text file. Rename file's .txt extension to .zip and unpack the new file as well. The content of the .zip file is the Custom R Component. These steps are needed as SCN does not allow the attachment of the component's original file type.

Then deploy the component as described here. You just need to copy the attached content in a folder described in the article and restart SAP Predictive Analysis.

Example

If you want to try this logistic regression on some sample data, you can use the Adult dataset as used in the article on the Naive Bayes Algorithm. Just remember that the column names must not include a minus sign.

Configure the component appropriately. In this case we want to predict a person's marital status. Remember not to use the "NativeCountry" column as predictor as it contains too many levels (Country names).

Run the model and you can see the predicted values either a raw data or in the embedded confusion matrix. 88.81% of the records have been correctly classified.

Now we want to determine how well the model can predict the martial status on data the model has not seen before. Save the trained model. Then add it as additional component into the testing-branch of the analytical flow.

Execute the component and go in the "Results" panel to the "Custom Chart" and you will see that another confusion matrix has been created. The component was able to identify automatically that the true classification is already known. If the classifier column (that was specified when training the model) exists in the dataset, the component assumes that it is tested on already classified data. Therefore it displays the confusion matrix to help evaluate the model's performance.

The trained model was able to accurately predict 83.97% of the previously unseen cases!

When applying the model on new data, for which the real classification is not known, the component will display a frequency plot of the predictions.

Custom R Component - Random Forest Classification

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List