Association Rules In Data Mining - Market Basket Analysis


Association Rules In Data Mining

Association rules are used to find interesting association or correlation relationships among a large set of data items in data mining process.

The discovery of interesting co-related relationships among great amounts of business transaction records can help in many business decision making processes, such as catalog design, cross-marketing, and loss-leader analysis.
 

One of the best examples of association rule mining is market basket analysis.

This process analyzes the customer's buying habits by finding associations between different items that customers place in their shopping habits. 


(Read also -> Schemas in Data Warehousing)

The discovery of such associations can help retailers develop marketing strategies by gaining insight into which items are frequently purchased together by the customers.

For instance, if customers are buying soap, how likely are they to also buy shampoo(and which kind of shampoo) on the same trip to the supermarket.

Such information can lead to increased sales by helping retailers do selective marketing and plan their shelf space.

For example, placing milk and bread within close proximity may further encourage the sale of these items together within single visits to the store.




Market basket analysis, association rule mining



Market Basket Analysis

Suppose, as manager of ABCElectronics(Company) branch you would like to learn more about the buying habits of your customers. Especially you wonder “Which groups or sets of items are customers likely to purchase on a given trip to the store”

To answer this question, market basket analysis may be performed on the retail data of customer transactions at your store.

In one strategy items that are frequently purchased together can be placed nearby to further encourage the sale of such items together.

If customers who purchase laptops also tend to buy system software at the same time, then placing the hardware display close to the software display may help to increase the sales of both of these items. 

In an alternative strategy, placing hardware and software at other ends of the store may entice customers who purchase such items to pick up other items along the way.

For example, after deciding on an expensive computer, a customer may observe security systems, while heading towards management software.

If we think of the universe as the set of items available at a shop, then each item has a Boolean Variable representing whether the item is present or absent.

Each basket can be represented by a Boolean Vector of values assigned to variables. 

The Boolean Vectors can be analyzed for buying patterns that reflect items that are frequently associated or purchased together.

These patterns can be represented in the form of association rules.

For example, the information that customers who purchase laptops tend to buy system management software at the same time is represented in the Association Rule below
 

laptop=> system_management_software
                    [ support = 2%, confidence = 60%]

 

Basic Concepts 

Support (X=>Y) => P (X Union Y)

Transactions containing both X and Y divided by Total no of Transactions 

Confidence (X=>Y) = P(Y/X) = Support(X union Y)/Support(X)

Transactions containing both X and Y divided by Transactions containing the only X

How are association rules mined by large databases?.
Association rule mining is a two-step process :
  • Find all frequent itemsets: By intuition, each of these itemsets will occur at least as frequently as a pre-determined minimum support count.
  • Generate strong association rules from the frequent itemsets: By intuition, these rules must satisfy minimum support and minimum confidence.

Classification of Association Rules

Boolean Association Rule 

  • It is based on the types of values handled in the rule, If a rule concerns associations between the presence or absence of items, it is a Boolean Association Rule.
  •  Example : laptop=> system_management_software
                    [ support = 2%, confidence = 60%]

Quantitative Association Rule

  • If a rule describes associations between quantitative items or attributes, then it is a quantitative association rule.
  • In these rules, quantitative values for items or attributes are partitioned into intervals. The following rule is an ex of a quantitative association rule, where X is a variable representing a customer
  • Example: age (x, “30..39”) ^ income (x, “42..48K”) - >buys (x, bike)

Single Dimension Association Rules

  • It is based on the dimensions of data involved in the rule, If the items or attributes in an association rule reference only one dimension, then it is a single-dimensional association rule.

Note the Rule

laptop=> system_management_software [ support = 2%, confidence = 60%] is a single-dimensional association rule since it refers to only one dimension, buys. If a rule references two or more dimensions, such as the dimensions buys, time_of_transaction, and customer_category, then it is a multidimensional association rule.
 

age (x, “30..39”) ^ income (x, “42..48K”) - > buys (x, bike)
    

The above rule is considered a multidimensional association rule since it involves three dimensions, age, income, and buys.

(Read also -> Classification In Data Mining)

Example

Let us consider the following set of transactions in a bookshop.

    t1 := {A,B,C,E}
    t2:= {B,D,E}
    t3 := {A,B,C,E}
    t4:= {A,B,D,E}
    t5:= {A,B,D,C,E}
    t6:= {B,D,C}


I={A,B,D,C,E} and T:={t1,t2,t3,t4,t5,t6}


Given: 

A database of transactions, each transaction is a list of items (purchased by a customer during a visit)


We need to find all rules that correlate the presence of one set of items with that of another set of items.

Here 98% who purchase A and B also purchase E.


Some of the key terms are

Set of items: I={I1,I2,…,Im}


Transactions
: D = {t1, t2,.., tn} be a set of transactions, where a transaction,t, is a set of items 


Itemset: {Ii1,Ii2, …, Iik} -> I


Support of an itemset: Percentage of transactions which contain that itemset.


Large (Frequent) itemset: Itemset whose number of occurrences is above a threshold.



Summary

Association rules in data mining is to find an interesting association or correlation relationships among a large set of data items.

Read also -> Data Mining Task Primitives

The discovery of interesting association relationships among huge amounts of business transaction records can help in many business decision making processes, such as catalog design, cross-marketing, and loss-leader analysis.


Subscribe us for more content on Data.    

  

Post a Comment

0 Comments