Here is a little fun with the HANA Predictive Algorithm Library (PAL). We will go from Ozzy Osbourne to a retailer's shopping basket analysis.
On this live website you can search for your favourte music band and you will receive a list of other bands you might like as well. Just click on the YouTube links of those recommendations and you can listen to the music right away.
This application is 100% SAP HANA: Data Storage, Prediction, Fuzzy Search, ODATA Interface, HTML5 front-end and application server.
Logon Credentials
The website is currently down for maintenance. Thanks for your patience.
URL | http://music.kmudemo.ch/dj.html |
User | HANA_PAL |
Password | Welcome14 |
This web-site is a little fun project, it is not anything official from SAP. Please understand that the site is running on a virtualised SAP HANA test server with limited resources, hence performance will differ at times.
Screenshots
Step 1 - Search box:
Ste 2 - Results:
Background
The online music community Last.fm kindly shared a dataset that lists all the music bands that 2000 of their users listened to and how often each user listened to each band. The Apriori algorithm from the SAP HANA Predictive Algorithm Library (PAL) was able to identify listening patterns in that historic dataset. This website uses the Apriori outpput and your personal music taste to recommend further bands you might like as well. Since the dataset was created in early 2011, you will not find bands that appeared after that time. Also, it appears the listening history comes from a mainly younger and English-speaking audience. Therefore you might not get recommendations for bands that are not popular with that group.
Predictive Algorithm
The algorithm that identifies the associations here is called Apriori Lite. It produces combinations of bands for which relationships were found. The input data looks as follows. You can see what bands each user has listend to and how often.
To focus on the bands that people clearly like, we filter on the bands a user has listened to at least 100 times. Then within SAP Predictive Analysis put this data through the HANA Apriori Lite algorithm and you will receive well over 300.000 different association rules. See the user guide for instructions on using the HANA Apriori algorithm in the graphical environment of SAP Predictive Analysis. You can also call the algorithm directly within a scripted calculation view, but this will be more difficult to implement.
Each rule is described in three output variables called Support, Confidence and Lift. Those values are also presented to the user in this website. Typically you would not want to display those technicalities in a front-end for the casual user, but it helps in this case to understand the functionality.
Here are some examples.
Let’s explain those terms with the help of Ozzy Osbourne.
Apriori Output | Meaning |
---|---|
Rule (Pre-rule / Post-rule) | States the combination of two bands for which an association was found. The first band is the one that you searched for in this case (pre-rule). The second band is the one you might be interested in (post-rule). Many times a band will appear in many rules as pre-rule or post-rule. One rule in the above screenshot is the combination of Ozzy Osbourne and Alice Cooper. |
Support | States the percentage of people from our history that this rule applies to. So here 0.83% of the people listened to both Ozzy Osbourne and Alice Cooper. |
Confidence | States in percent, how many of the Ozzy Osbourne listeners also listen to Alice Cooper. In our dataset 31% of the Ozzy Osbourne listeners also listen to Alice Cooper. |
Lift | States how many times an Ozzy Osbourne listener is more likely to listen to Alice Cooper compared to the average of all users. In our dataset the Ozzy Osbourne listener is 16 times more likely to listen to Alice Cooper than the average |
Apriori Lite creates rules that are pretty powerful, easy to understand and easy to use in such a website. You like one band and you see other bands you might like too. However, you can also use the "full" Apriori algorithm which works with combinations of bands as pre-rule and post-rule. This means it can identify more specific rules, for instance it can tell you that if you like both Ozzy Osbourne and Alice Cooper what other bands or band combinations you might enjoy.
Shopping Basket Analysis
This Apriori algorithm is also used for shopping basket analysis to identify products that are often purchased together. Next time you buy a book online for instance, you have an idea how the site is able to recommend additional books for you. There are other algorithms that can also produce such product recommendations, but the site might well be using Apriori. Instead of people’s listening habits they will use their customers' purchasing history to create such rules. So the historic data pool is then not "bands listend to by user" but "books purchased by user".
Interpreting the Apriori Output
For some bands well over 100 associations were found, so which ones should be displayed? And as so often, the answer is a very clear: “It Depends”. It really depends on what you want to achieve. Imagine this search is part of a website selling music. Then you might want to promote very popular bands, for instance with the slogan ”Your music collection is not complete without these bands”. If this is what you are after, then you need to recommend bands that have a high confidence.
However, you also want to bring in variety. Think of the slogan “Be the first to discover these bands”. Now you really want to have bands with a high lift.
So out of the very same dataset, you can deduct very many different recommendations. This website is currently tuned towards somewhere in between “Be the first to know” and “Must haves for the music collection”. First the 40 bands with the highest confidence are identified, then from those the 10 bands with the highest lift are shown.
The following SAP Lumira Chart shows the associations very nicely, still focussed on Ozzy Osbourne. You see for instance that high confidence and high support tend to come with a low lift. If many people are already listening to the band then it is obviously difficult, if not impossible, to outperform by a large margin. The bands at the bottom right are the ones your music collection is not complete without. And appropriately the very high lifts have low support and confidence. You could call these the high-risk but also high-return combinations. The top left is the corner with the bands to discover before everyone else.
Attached to this article is a text file with all the rules for Ozzy Osbourne. You can view these rules in SAP Predictive Analysis or SAP Lumira. To create the above chart select CSV as data source and choose "Delimited by: Tab". Create measures for Support, Confidence and Lift. I found it best to set their aggregation type to "None". Now you can create this scatter plot.
Implementation Steps
If you want to know more about how this was implemented, here are the steps in high-level how this was done:
- Have a SAP HANA instance that has the PAL implemented. You can use SAP HANA One on the Amazon Web Services for instance.
- Load the listening history into SAP HANA. There are many ways to load data, this was simply done with an import directly from within the SAP HANA studio.
- Create a view on top of this data. Apply a filter so that the view returns only the bands a user has listened to at least 100 times..
- From within SAP Predictive Analysis connect to this view and apply the HANA Apriori Lite algorithm.
- Persist the results from the algorithm to a table in SAP HANA.
- Place a calcuation view on top of this data that is parameterised on the input rule. Apply some filtering to focus on the relevant recommendations (ie take only the ones with the highest lift).
- Present this view as ODATA web service.
- Create a parameterised calculation view that handles the fuzzy logic. Present this view as a second ODATA web service.
- Now create the HTML5 front-end. Insert a search box that takes the input from the user. This value is passed to the fuzzy calculation view to correct any typos. Pass the result of this fuzzy calculation view in the first view to retrieve the recommendation rules of the band that was searched for. Display the results in a table.
Found Anything Surprising?
If you find anything interesting in the data, feel free post it in the comments.