Packt+ | Advance your knowledge in tech

You're reading from R Data Analysis Projects Build end to end analytics systems to get deeper insights from your data

Product type Paperback

Published in Nov 2017

Publisher Packt

ISBN-13 9781788621878

Length 366 pages

Edition 1st Edition

Languages

Tools

ggplot

Concepts

Data Analysis

Author (1):

Gopi Subramanian

View More author details

Let's get back to our retailer. Let's use what we have built so far to provide recommendations to our retailer for his cross-selling strategy.

This can be implemented using the following code:

###########################################################################
 #
 # R Data Analysis Projects
 #
 # Chapter 1
 #
 # Building Recommender System
 # A step step approach to build Association Rule Mining
 #
 #
 # Script:
 # Generating rules for cross sell campaign.
 #
 #
 # Gopi Subramanian
 ###########################################################################

library(arules)
library(igraph)

get.txn <- function(data.path, columns){
 # Get transaction object for a given data file
 #
 # Args:
 # data.path: data file name location
 # columns: transaction id and item id columns.
 #
 # Returns:
 # transaction object
 transactions.obj <- read.transactions(file = data.path, format = "single",
 sep = ",",
 cols = columns,
 rm.duplicates = FALSE,
 quote = "", skip = 0,
 encoding = "unknown")
 return(transactions.obj)
 }

get.rules <- function(support, confidence, transactions){
 # Get Apriori rules for given support and confidence values
 #
 # Args:
 # support: support parameter
 # confidence: confidence parameter
 #
 # Returns:
 # rules object
 parameters = list(
 support = support,
 confidence = confidence,
 minlen = 2, # Minimal number of items per item set
 maxlen = 10, # Maximal number of items per item set
 target = "rules"
 
 )
 
 rules <- apriori(transactions, parameter = parameters)
 return(rules)
 }

find.rules <- function(transactions, support, confidence, topN = 10){
 # Generate and prune the rules for given support confidence value
 #
 # Args:
 # transactions: Transaction object, list of transactions
 # support: Minimum support threshold
 # confidence: Minimum confidence threshold
 # Returns:
 # A data frame with the best set of rules and their support and confidence values
 
 
 # Get rules for given combination of support and confidence
 all.rules <- get.rules(support, confidence, transactions)
 
 rules.df <-data.frame(rules = labels(all.rules)
 , all.rules@quality)
 
 other.im <- interestMeasure(all.rules, transactions = transactions)
 
 rules.df <- cbind(rules.df, other.im[,c('conviction','leverage')])
 
 
 # Keep the best rule based on the interest measure
 best.rules.df <- head(rules.df[order(-rules.df$leverage),],topN)
 
 return(best.rules.df)
 }

plot.graph <- function(cross.sell.rules){
 # Plot the associated items as graph
 #
 # Args:
 # cross.sell.rules: Set of final rules recommended
 # Returns:
 # None
 edges <- unlist(lapply(cross.sell.rules['rules'], strsplit, split='=>'))
 
 g <- graph(edges = edges)
 plot(g)
 
 }

support <- 0.01
confidence <- 0.2

columns <- c("order_id", "product_id") ## columns of interest in data file
 data.path = '../../data/data.csv' ## Path to data file

transactions.obj <- get.txn(data.path, columns) ## create txn object

cross.sell.rules <- find.rules( transactions.obj, support, confidence )
 cross.sell.rules$rules <- as.character(cross.sell.rules$rules)

plot.graph(cross.sell.rules)

After exploring the dataset for support and confidence values, we set the support and confidence values as 0.001 and 0.2 respectively.

We have written a function called find.rules. It internally calls get.rules. This function returns the list of top N rules given the transaction and support/confidence thresholds. We are interested in the top 10 rules. As discussed, we are going to use lift values for our recommendation. The following are our top 10 rules:

  rules support confidence lift conviction leverage
 59 {Organic Hass Avocado} => {Bag of Organic Bananas} 0.03219805 0.3086420 1.900256 1.211498 0.01525399
 63 {Organic Strawberries} => {Bag of Organic Bananas} 0.03577562 0.2753304 1.695162 1.155808 0.01467107
 64 {Bag of Organic Bananas} => {Organic Strawberries} 0.03577562 0.2202643 1.695162 1.115843 0.01467107
 52 {Limes} => {Large Lemon} 0.01846022 0.2461832 3.221588 1.225209 0.01273006
 53 {Large Lemon} => {Limes} 0.01846022 0.2415730 3.221588 1.219648 0.01273006
 51 {Organic Raspberries} => {Bag of Organic Bananas} 0.02318260 0.3410526 2.099802 1.271086 0.01214223
 50 {Organic Raspberries} => {Organic Strawberries} 0.02003434 0.2947368 2.268305 1.233671 0.01120205
 40 {Organic Yellow Onion} => {Organic Garlic} 0.01431025 0.2525253 4.084830 1.255132 0.01080698
 41 {Organic Garlic} => {Organic Yellow Onion} 0.01431025 0.2314815 4.084830 1.227467 0.01080698
 58 {Organic Hass Avocado} => {Organic Strawberries} 0.02432742 0.2331962 1.794686 1.134662 0.01077217

The first entry has a lift value of 1.9, indicating that the products are not independent. This rule has a support of 3 percent and the system has 30 percent confidence for this rule. We recommend that the retailer uses these two products in his cross-selling campaign as, given the lift value, there is a high probability of the customer picking up a {Bag of Organic Bananas} if he picks up an {Organic Hass Avocado}.

Curiously, we have also included two other interest measures—conviction and leverage.