Attribute Oriented Induction In Data Mining - Data Characterization

Attribute oriented induction


Attribute-Oriented Induction

The Attribute-Oriented Induction (AOI) approach to data generalization and summarization – based characterization was first proposed in 1989 (KDD ‘89 workshop) a few years before the introduction of the data cube approach.

The data cube approach can be considered as a data warehouse – based, pre computational – oriented, materialized approach.

It performs off-line aggregation before an OLAP or data mining query is submitted for processing. 

On the other hand, the attribute oriented induction approach, at least in its initial proposal, a relational database query – oriented, generalized – based, on-line data analysis technique.


However, there is no inherent barrier distinguishing the two approaches based on online aggregation versus offline precomputation.

Some aggregations in the data cube can be computed on-line, while off-line precomputation of multidimensional space can speed up attribute-oriented induction as well.


It was proposed in 1989 (KDD ‘89 workshop).

It is not confined to categorical data nor particular measures.




(Check Out The Data Science Course On Udemy)

How it is done?

  • Collect the task-relevant data( initial relation) using a relational database query
  • Perform generalization by attribute removal or attribute generalization.
  • Apply aggregation by merging identical, generalized tuples and accumulating their respective counts.
  • Reduces the size of the generalized data set.
  • Interactive presentation with users.

Basic Principles Of Attribute Oriented Induction

Data focusing: 

  • Analyzing task-relevant data, including dimensions, and the result is the initial relation.

Attribute-removal: 

  • To remove attribute A if there is a large set of distinct values for A but (1) there is no generalization operator on A, or (2) A’s higher-level concepts are expressed in terms of other attributes.

Attribute-generalization: 

  • If there is a large set of distinct values for A, and there exists a set of generalization operators on A, then select an operator and generalize A. 

Attribute-threshold control: 

  • Typical 2-8, specified/default.

Generalized relation threshold control (10-30):

  • To control the final relation/rule size.  

Algorithm for Attribute Oriented Induction

InitialRel: 

  • It is nothing but query processing of task-relevant data and deriving the initial relation.

PreGen: 

  • It is based on the analysis of the number of distinct values in each attribute and to determine the generalization plan for each attribute: removal? or how high to generalize?

PrimeGen: 

  • It is based on the PreGen plan and performing the generalization to the right level to derive a “prime generalized relation” and also accumulating the counts.

Presentation: 

  • User interaction: (1) adjust levels by drilling, (2) pivoting, (3) mapping into rules, cross tabs, visualization presentations.

Example

Let's say there is a University database that is to be characterized, for that its corresponding DMQL will be

use University_DB
mine characteristics as “Science_Students”
in relevance to name, gender, major, birth_place, birth_date, residence, phone_no, GPA
from student

Its corresponding SQL statement can be:

Select name, gender, major, birth_place, birth_date, residence, phone_no, GPA
from student
where status in {“Msc”, “MBA”, “Ph.D.” }
 
Now for this database let's create a characterized view: 
 

InitialRel:

  • From this table, we are querying task-relevant data.
  • From this table, we also removed a few attributes like name and phoneno, because they make no sense in concluding insights. 

PreGen

  • Now, we have generalized these results by removing a few attributes and retaining important attributes.
  • And also we have generalized a few attributes by naming them "Country" rather than "Birth_Place", "Age Range" rather than "Birth_data", "City" rather than "Residence" and so on as per the table given below.

attribute oriented induction


PrimeGen

  • Based on the PreGen plan we've performed generalization to the right level to derive a “prime generalized relation” and also we've accumulated the counts.

attribute oriented induction

Final Results 

  • Now we've and analyzed and concluded our final generalized results as shown below.

attribute oriented induction

Presentation Of Results

Generalized relation:

  • Relations where some or all attributes are generalized, with counts or other aggregation values accumulated.

Cross-tabulation:

  • Mapping results into cross-tabulation form (similar to contingency tables). 

Visualization techniques:

  • Pie charts, bar charts, curves, cubes, and other visual forms.

Quantitative characteristic rules:

  • Mapping generalized results in characteristic rules with quantitative information associated with it.

Summary

The Attribute-Oriented Induction (AOI) approach to data generalization and summarization – based characterization was first proposed in 1989 (KDD ‘89 workshop) a few years before the introduction of the data cube approach.

Subscribe us for more content on Data. 

Post a Comment

0 Comments