Integration Of Data Mining Systems With Data Warehouse & Database



Integrating Data Mining With Database/Data Warehouse Systems

With the exponential growth of data, data mining systems should be efficient and highly performative to build complex machine learning models, it is expected that a good variety of data mining systems will be designed and developed.

Comprehensive information processing and data analysis will be continuously and systematically surrounded by data warehouse and databases.

Data Mining System Architecture

A critical question in design is whether we should integrate data mining systems with database systems.

Integrating Data Mining systems with Databases and Data Warehouses with these methods
  • No Coupling
  • Loose Coupling
  • Semi-Tight Coupling
  • Tight Coupling 

No Coupling

No coupling means that a DM system will not utilize any function of a DB or DW system. 

It may fetch data from a particular source (such as a file system), process data using some data mining algorithms, and then store the mining results in another file.

Drawbacks:

First, a Database/Data Warehouse system provides a great deal of flexibility and efficiency at storing, organizing, accessing, and processing data.

Without using a Database/Data Warehouse system, a Data Mining system may spend a substantial amount of time finding, collecting, cleaning, and transforming data.

Second, there are many tested, scalable algorithms and data structures implemented in Database
and Data Warehouse systems.
 

Loose Coupling

Loose coupling means that a Data Mining system will use some facilities of a Database or Data warehouse system, fetching data from a data repository managed by these systems, performing data mining, and then storing the mining results either in a file or in a designated place in a Database or Data Warehouse.

Loose coupling is better than no coupling because it can fetch any portion of data stored in Databases or Data Warehouses by using query processing, indexing, and other system facilities.

Drawbacks

It's difficult for loose coupling to achieve high scalability and good performance with large data sets.

Semi-Tight Coupling - Enhanced Data Mining Performance

The semi-tight coupling means that besides linking a Data Mining system to a Database/Data Warehouse system, efficient implementations of a few essential data mining primitives (identified by the analysis of frequently encountered data mining functions) can be provided in the Database/Data Warehouse system. 

These primitives can include sorting, indexing, aggregation, histogram analysis, multi-way join, and pre-computation of some essential statistical measures, such as sum, count, max, min, standard deviation.

This design will enhance the performance of Data Mining systems.

Tight Coupling - A Uniform Information Processing Environment

Tight coupling means that a Data Mining system is smoothly integrated into the Database/Data Warehouse system. 

The data mining subsystem is treated as one functional component of the information system.

Data mining queries and functions are optimized based on mining query analysis, data structures, indexing schemes, and query processing methods of a Database or Data Warehouse system.


Summary

Data Mining Architecture Integrated With Database & Data Warehouse System
  • No Coupling
  • Loose Coupling
  • Semi-Tight Coupling
  • Tight Coupling
 
Subscribe us for more content on Data.

  

Post a Comment

0 Comments