Quantcast
Channel: SCN : All Content - SAP BusinessObjects Predictive Analytics
Viewing all articles
Browse latest Browse all 836

Is SAP Predictive Analysis 1.0 SP11 the real game changer?

$
0
0

Last year, SAP announced the launch of their new solution in the predictive analytics portfolio, SAP Predictive Analysis 1.0. It was a replacement to the classical offering of SAP BO Predictive Workbench (a BO frontend to IBM's SPSS statistical/predictive engine), however it was also part of a bigger strategy of spreading SAP's offering in the advanced analytics space, specially related to Big Data analytics. With PA, HANA and HANA native predictive library (Predictive Analysis Library or PAL), which enables the execution of predictive algorithms in-database (i.e. with procedures running in the DB layer and exporting just the resultset, instead of exporting the whole dataset for the algorithms to run in the application layer), SAP had become a big contender in the Big Data Predictive Analytics space. It was so that Forrester recognized SAP's strategy strength and positioned it together with SAS and IBM in the "leaders' wave" in their most recent Big Data Predictive Analytics report.

 

forrester-2013-big-data-predictive-analytics.png

Figure 1 - Forrester Wave: Big Data Predictive Analytics Solutions Q1'13

 

Of course SAP still has a very feeble marketshare in the predictive analytics space, specially compared to the likes of SAS and IBM, but the strength of SAP's offering, centralized on HANA's real time, in-memory & big data capabilities, was enough to put SAP well positioned in the analysts' eyes.

 

However, in practical terms, SAP's actual portfolio was very limited. Yes, HANA brings a lot of modern and groundbreaking technologies to the game that weren't available before, but in terms of actual functionalities (i.e. analytical models possible to be implemented), it was still behind its main competitors by far. Solutions like SAS (Stat and Enteprise Miner) and SPSS (Statistics and Modeler) have been in the market for over 45 years, being even older than SAP's R/2. And that means lots and lots of experience and, most importantly, content, which translates into the vast amount of algorithms and industry-specific pre-built analytic models. In the meantime, SAP's initial offering consisted of a couple of dozen algorithms in PAL, which still had to be consumed through procedures that had to be written in HANA (i.e. not end-user friendly, like SAS or SPSS) with just a handful of these algorithms being supported in PA (which was intended to be the end-user/analyst friendly tool). Now, 90% (that is of course a guess) of the analytical models companies usually use (or want to use) couldn't be implemented with HANA and PA alone.

 

And of course SAP knew that. And as part of that strategy, since day one SAP has announced that R integration has been a huge part of its predictive analytic strategy. R is an open source statistics/predictive analytical environment vastly used throughout the world, said to include more than 3,500 distinct algorithms in their standard set of libraries. It initially appeared strongly in the academic world (much like SPSS and SAS did 20 or 30 years ago) and has recently seen a considerable and steady growth in adoption by the corporate world. Rexer Analytics 2011 Data Miner Survey presents R as the most used statistical solution in the world: 47% of the companies which participated in the survey claimed to use R - and that number tends to be even higher when Rexer announce their 2013 Survey results in next September.

 

http://revolution-computing.typepad.com/.a/6a010534b1db25970b01676908ecaf970b-pi

Figure 2 - Rexer Analytics 2011 Data Miner Survey results

(notice SAP didn't appear as a player in 2011 - that will probably change in 2013... ;-) )

 

But even with R, SAP's offering was still weaker. HANA R integration, while powerful, still demanded a very specific set of development skills in order to deliver actual analytical models to the business users (as demonstrated in this excellent blog by Blag). The big corporations with their own ranks of statistical analysts and R developers wouldn't have a problem with that, but that isn't the case for the vast majority of companies seeking statistical insights. Most of them rely on very few people with deep functional and business expertise but usually very little or almost none technical knowledge.

 

In this context, the PA R integration would then play a major role in enabling these users to consume HANA and/or R algorithms through an user friendly, graphical tool (in the likes of SAS and SPSS Modeler). And R integration was there since the first release of PA. In the latest ones, it even installs R for you without requiring any previous knowledge from the user. However, the number of R functions (algorithms) supported in PA is still very limited, again reducing the applicability of PA in real-life scenarios...

 

That was until before the latest release.

With yesterday's launch of PA 1.0 SP11 in GA, for the first time, SAP brings the possibility of consuming custom R functionalities (i.e. algorithms that weren't built in standard PA) without having to resort to developing HANA SQLScript/R procedures. One can reuse their existing R scripts (just adapting it to a function model, as demonstrated below) and graphically create their analytical models with the most complex algorithms they can imagine. And, even better, they can run these models with both a local instance of R (the so called "standalone R" scenario), usually suited for smaller datasets/prototyping, or with a served instance of R connected to a HANA appliance (the so called "HANA R" scenario, described in Blag's blog mentioned above), which enables a very performatic execution model on big data use cases of these custom models. For the first time, the full potential of the 3,500+ R algorithms is really there to be utilized by the PA users.

 

And it is indeed very simple. Let's take, for example, the R script from Blag's blog.

All you have to do is to adapt it a little bit to encapsulate the input and output parameters in a R function syntax.

I've done that with Blag's code (and I've also modified it a little bit in order to have a little bit more meaningful output).

Here is what the modified code looks like:

 

predict_tickets<-function(tickets_year) {     period=as.integer(tickets_year$PERIOD)     tickets=as.integer(tickets_year$TICKETS)     var_year=as.integer(substr(period[1],1,4))     var_year=var_year+1     new_period=gsub("^\\d{4}",var_year,period)     next_year=data.frame(year=new_period)     prt.lm<-lm(tickets ~ period)     pred=round(predict(prt.lm,next_year,interval="none"))     result<-data.frame(PERIOD=as.character(new_period),TICKETS=0,PRED_TICKETS=pred)     tickets_year$PRED_TICKETS<-0     output<-rbind(tickets_year,result)     mp<-barplot(output$TICKETS)     axis(1, at = mp, labels = c(output$PERIOD))     lines(c(0,0,output$PRED_TICKETS),col="red")     return(list(out=output))
}

 

 

Here are the steps to create a custom R component in PA 1.0 SP11:

 

  1. Acquire the data set from any source (in my case, I tested with two documents, a HANA Online document on top of a HANA table exactly like in Blag's blog, and a CSV document with the table content exported to a .csv file).
  2. In the "Designer" tab of the "Predict" view, click on "Add New Component" -> "R Component".pa-r-component.png
  3. Follow the R component creation wizard. First, in the "General" screen, give a name and description. Click on Next.pa-create-new-component-1.png
  4. In the "Script" screen, load or paste your R script (remember to adapt to the function syntax - hovering the mouse over the symbol shows a sample code), fill the required parameters (most of them are selectable in the available dropdown boxes) and select the desired options (I'll comment a little bit on these options below). Click on Next.pa-create-new-component-2.png
  5. In the "Settings" screen, you define the output columns that will be available to the next step of your analytical  model. You can reuse the input columns and just add the new ones or redefine all columns from scratch. If your R function has additional input parameters that not just the input DataFrame (that comes from the previous step in the model), you'll be able to define them here (they'll be editable parameters when you instantiate the component in the model). Click on Finish.pa-create-new-component-3.png
  6. That's it. Your component will appear in the list of available algorithms and you can now instantiate it in your model (just click twice or drag it to the model).pa-custom-model.png
  7. Once you run your model, you'll then be able to see the output in the Grid tab of the Results view, as you would for a regular PA model.pa-results-grid.png

 

Some comments on the options shown in the "Script" screen of the custom R component wizard shown above. The "Show Visualization" option enables that any graph plotted in the R script is shown in the "Charts" tab of the "Results" view. This is very nice when you want to plot some graph types not supported by PA/Lumira yet. In my tests, however, the "Show Visualization" option would only appear for custom R components created on top of offline documents, i.e. using the local standalone R engine, it didn't appear when creating custom R components on top of a HANA Online document, so apparently PA can't transfer plots from the HANA R server yet. In the sample R script I've posted above, there are some plotting commands (barplot, lines, axis) which I used to plot simple bar and line charts in order to test the R graphs in PA, which worked nicely for me, as shown below.

 

pa-results-charts.pngFigure 9 - R graph shown in PA

 

But, in case you need a graph type that is supported natively by PA/Lumira visualization capabilities (and bar and line charts are), of course you're much better served using these, since they look much nicer.

 

pa-results-visualize.pngFigure 10 - PA/Lumira native visualization

 

And to finalize, even though the custom R component is not the only new functionality in PA 1.0 SP11, it's definitely one of the most, if not the most, exciting ones, due to the applicability possibilities it brings to the SAP HANA/PA/R solution architecture. This blog shows a broader view of other new functionalities brought by PA 1.0 SP11. One other feature I'm also very excited about is the possibility to export HANA PAL based models as HANA SQLScript procedures. This feature enables the business analysts to not just prototype but actually deliver a complete analytical model to the IT team in an easily consumable manner (i.e. database procedures, that can be consumed by any other applications/job scheduling tools). Unfortunately, this feature apparently isn't available to models including custom R components. I hope to see this capability very soon, because then SAP will have closed the gap to a complete and comprehensive predictive model delivery life cycle without requiring full blown developers: develop & test models in PA -> elect the desired models to HANA procedures -> consume these in corporate-wide analytical scenarios (thru for example BO 4.x frontend on top of HANA, or even on BW with Virtual Info Providers based on HANA views constructed on top of these procedures).

 

All in all, for me as a consultant, this is the first version of SAP Predictive Analysis that can be really considered in a productive deployment of a statistical/predictive project. Furthermore, once you consider that PA has been launched less than 7 months ago, then you notice how fast SAP has been moving with these new game changing solutions, how fast PA has evolved in these 7 months and how powerful it will be in a 1 or 2 year timeframe. It is definitely a player in the Predictive Analytics market and, together with the performance and robustness of HANA and the flexibility and completeness of R, it is a serious contender to SAS and SPSS.

 

EDIT:

Vishwa, product manager for SAP Predictive Analysis, has released a helpful FAQ on custom R components in PA and also a detailed step-by-step on how to create your own custom R component. Very good, Vishwa!


Viewing all articles
Browse latest Browse all 836

Trending Articles