Let's get back to our retailer. Let's use what we have built so far to provide recommendations to our retailer for his cross-selling strategy.
This can be implemented using the following code:
###########################################################################
#
# R Data Analysis Projects
#
# Chapter 1
#
# Building Recommender System
# A step step approach to build Association Rule Mining
#
#
# Script:
# Generating rules for cross sell campaign.
#
#
# Gopi Subramanian
###########################################################################
library(arules)
library(igraph)
get.txn <- function(data.path, columns){
# Get transaction object for a given data file
#
# Args:
# data.path: data file name location
# columns: transaction id and item id columns.
#
# Returns:
# transaction object
transactions.obj <- read.transactions(file = data.path, format = "single",
sep = ",",
cols = columns,
rm.duplicates = FALSE,
quote = "", skip = 0,
encoding = "unknown")
return(transactions.obj)
}
get.rules <- function(support, confidence, transactions){
# Get Apriori rules for given support and confidence values
#
# Args:
# support: support parameter
# confidence: confidence parameter
#
# Returns:
# rules object
parameters = list(
support = support,
confidence = confidence,
minlen = 2, # Minimal number of items per item set
maxlen = 10, # Maximal number of items per item set
target = "rules"
)
rules <- apriori(transactions, parameter = parameters)
return(rules)
}
find.rules <- function(transactions, support, confidence, topN = 10){
# Generate and prune the rules for given support confidence value
#
# Args:
# transactions: Transaction object, list of transactions
# support: Minimum support threshold
# confidence: Minimum confidence threshold
# Returns:
# A data frame with the best set of rules and their support and confidence values
# Get rules for given combination of support and confidence
all.rules <- get.rules(support, confidence, transactions)
rules.df <-data.frame(rules = labels(all.rules)
, all.rules@quality)
other.im <- interestMeasure(all.rules, transactions = transactions)
rules.df <- cbind(rules.df, other.im[,c('conviction','leverage')])
# Keep the best rule based on the interest measure
best.rules.df <- head(rules.df[order(-rules.df$leverage),],topN)
return(best.rules.df)
}
plot.graph <- function(cross.sell.rules){
# Plot the associated items as graph
#
# Args:
# cross.sell.rules: Set of final rules recommended
# Returns:
# None
edges <- unlist(lapply(cross.sell.rules['rules'], strsplit, split='=>'))
g <- graph(edges = edges)
plot(g)
}
support <- 0.01
confidence <- 0.2
columns <- c("order_id", "product_id") ## columns of interest in data file
data.path = '../../data/data.csv' ## Path to data file
transactions.obj <- get.txn(data.path, columns) ## create txn object
cross.sell.rules <- find.rules( transactions.obj, support, confidence )
cross.sell.rules$rules <- as.character(cross.sell.rules$rules)
plot.graph(cross.sell.rules)
After exploring the dataset for support and confidence values, we set the support and confidence values as 0.001 and 0.2 respectively.
We have written a function called find.rules. It internally calls get.rules. This function returns the list of top N rules given the transaction and support/confidence thresholds. We are interested in the top 10 rules. As discussed, we are going to use lift values for our recommendation. The following are our top 10 rules:
rules support confidence lift conviction leverage
59 {Organic Hass Avocado} => {Bag of Organic Bananas} 0.03219805 0.3086420 1.900256 1.211498 0.01525399
63 {Organic Strawberries} => {Bag of Organic Bananas} 0.03577562 0.2753304 1.695162 1.155808 0.01467107
64 {Bag of Organic Bananas} => {Organic Strawberries} 0.03577562 0.2202643 1.695162 1.115843 0.01467107
52 {Limes} => {Large Lemon} 0.01846022 0.2461832 3.221588 1.225209 0.01273006
53 {Large Lemon} => {Limes} 0.01846022 0.2415730 3.221588 1.219648 0.01273006
51 {Organic Raspberries} => {Bag of Organic Bananas} 0.02318260 0.3410526 2.099802 1.271086 0.01214223
50 {Organic Raspberries} => {Organic Strawberries} 0.02003434 0.2947368 2.268305 1.233671 0.01120205
40 {Organic Yellow Onion} => {Organic Garlic} 0.01431025 0.2525253 4.084830 1.255132 0.01080698
41 {Organic Garlic} => {Organic Yellow Onion} 0.01431025 0.2314815 4.084830 1.227467 0.01080698
58 {Organic Hass Avocado} => {Organic Strawberries} 0.02432742 0.2331962 1.794686 1.134662 0.01077217
The first entry has a lift value of 1.9, indicating that the products are not independent. This rule has a support of 3 percent and the system has 30 percent confidence for this rule. We recommend that the retailer uses these two products in his cross-selling campaign as, given the lift value, there is a high probability of the customer picking up a {Bag of Organic Bananas} if he picks up an {Organic Hass Avocado}.
Curiously, we have also included two other interest measures—conviction and leverage.