Major Issues In Data Mining - Here Are The Major Issues In Data Mining

Data Mining Issues

Major Issues In Data Mining

Issues in the data mining process are broadly divided into three.
  • Mining Methodology
  • User Interaction
  • Applications & Social Impacts 

Mining Methodology 

It involves understanding the issues regarding different factors regarding mining techniques.
  • Mining different kinds of knowledge from diverse data types, e.g., bio, stream, Web.
  • Handling noise and incomplete data: data cleaning and data analysis methods that can handle noise are required. Outlier mining methods for discovery and analysis of exceptional cases.
  • Incorporation of background knowledge: domain knowledge is required to guide the discovery process and express patterns in concise terms and at different levels of abstraction.
  • Pattern evaluation: the interestingness problem
  • Performance: efficiency, effectiveness, and scalability: running time of data mining algorithm must be predictable and acceptable.
  • Parallel, distributed, and incremental mining methods.
  • Integration of the discovered knowledge with the existing one. 

User Interaction

It involves understanding the issues regarding mined data or interpretation of data by the end-user.
  • It involves data mining query languages and Adhoc mining languages.
  • Data mining query language needs to be developed to allow users to describe ad-hoc data mining tasks.
  • Interpretation of expression and visualization of data mining results.
  • Interactive mining of knowledge at multiple levels of abstraction.

Applications and social impacts

It involves understanding issues regarding how the interpreted data or mined data can be applied in real-world scenarios.
  • Performing domain-specific data mining & invisible data mining
  •  Eg. Companies like Amazon keeps track of customer profiles
  • Protection of data security, integrity, and privacy
  • We need to observe data sensitivity and preserve people's privacy while performing successful data mining.

A data mining system has the potential to generate thousands or even millions of patterns and insights, or rules, then “are all of the patterns interesting?” Typically not—only a small fraction of the patterns potentially generated would actually be of interest to any given user.

What makes a pattern interesting? 

To answer this question, a pattern is interesting if it is easily understood by humans, (2)valid on new or test data with some degree of certainty, potentially useful, and novel.

A pattern is also interesting if it validates a concept that the user sought to confirm. An interesting pattern represents knowledge.

Several objective measures of pattern interestingness exist. 

An association rule of the form X=>Y is rule support, representing the percentage of transactions from a transaction database that the given rule satisfies. i.e P(X U Y)

Another objective measure for association rule is confidence, which assesses the degree of certainty of detected association. This is taken to be conditional probability P(Y|X) i.e., the probability that a T containing X also contains Y.

Each interesting measure is associated with a threshold, which may be controlled by the user.

Can a data mining system generate all of the interesting patterns?
  • The answer to this depends on the completeness of the data mining algorithm.
  • Unrealistic and inefficient
  • We need to focus on a search based on user-provided constraints and interestingness measures.

Can a data mining system generate only interesting patterns?
  • This is an optimization problem.
  • Highly desirable.
  • But still a challenging issue in data mining. 

Summary

Issues in the data mining process are broadly divided into three.
  • Mining Methodology
  • User Interaction
  • Applications & Social Impacts 

Subscribe us for more content on Data.    
Read also -> Classification In Data Mining   

Post a Comment

0 Comments