(DMQL)Data Mining Query Language - For Databases & Data Warehouses

data mining query language


Introduction To Data Mining Query Language

It was proposed by Han, Fu, Wang, et al. for the Database Miner data mining system. Although it was based on the structured Data Mining Query Language, These query languages are designed to support ad hoc and interactive data mining. Also, it provides commands for specifying primitives. 

We can use Data Mining Query Language(DMQL) to work with databases and data warehouses as well. We can also use it to define data mining tasks. Particularly we examine how to define data warehouses and data marts in DMQL. 

Data Mining Query Language(DMQL) adopts SQL-like syntax.

Hence, it can be easily integrated with relational query languages.

A Data Mining Query Language (DMQL) can be designed to incorporate these primitives, allowing users to flexibly interact with data mining systems.

It is defined in BNF grammer
  • [ ] represents 0 or one occurrence.
  • { } represents 0 or more occurrences.
  • Words in sans serif represent keywords.


Motivation To Data Mining Query Language

A Data Mining Query Language (DMQL) can provide the ability to support ad-hoc and interactive data mining.

By providing a standardized language like SQL
  • We can hope to achieve a similar effect like that SQL has on the relational database.
  • It helps in the foundation for system development and evolution.
  • It facilitates information exchange, technology transfer, commercialization, and wide acceptance.


Syntax For DMQL

DMQL is designed for Data Mining Task Primitives.

Its syntax goes for all of the primitives.

Syntax for the specification of
  • Task-relevant data
  • The kind of knowledge to be mined
  • Concept hierarchy specification
  • Interestingness measure
  • Pattern presentation and visualization

Putting it all together — a DMQL query is formed.

 

DMQL-Syntax for task-relevant data specification 

Names of the relevant database or data warehouse, conditions, and relevant attributes or dimensions must be specified:

use database ‹database_name› or use data warehouse ‹data_warehouse_name›      
from ‹relation(s)/cube(s)› [where condition] (data cubes and tables)
in relevance to ‹attribute_or_dimension_list› (attributes or dimension for exploration)
order by ‹order_list› (sorting order)
group by ‹grouping_list› (specifies criteria to group)
having ‹condition› (it represent which group of data are considered relevant)


Syntax for Specifying Kind of Knowledge to be Mined

Characterization:

 ‹Mine_Knowledge_Specification›::=  
  mine characteristics [as ‹pattern_name›]
  analyze ‹measure(s)›

Example: 

mine characteristics as customerPurchasing  analyze count%

 

Discrimination:

  ‹Mine_Knowledge_Specification› ::=    
  mine comparison [as ‹ pattern_name›]     
  for ‹target_class› where ‹target_condition {versus ‹contrast_class_i where ‹contrast_condition_i›} analyze ‹measure(s)›

Example: 
 
    Mine comparison as purchaseGroups
    for bigspenders where avg(I.price)  >= $100
    versus budgetspenders where avg(I.price) < $100
    analyze count 
 
     


Association: 

This specifies the mining of patterns of association

‹Mine_Knowledge_Specification›::=     
mine associations [as ‹pattern_name›] [matching ‹metapattern›]

Example: 

mine associations as buyingHabits matching  P(X: customer, W) ^ Q(X,Y) => buys (X,Z)


Classification:

‹Mine_Knowledge_Specification› ::= mine classification [as ‹pattern_name›] analyze ‹classifying_attribute_or_dimension›

Example: 

mine classification as classifyCustomerCreditRating analyze credit_rating


Syntax for concept hierarchy specification

More than one concept per attribute can be specified

Use hierarchy ‹hierarchy_name› for ‹attribute_or_dimension›

Examples:

Schema concept hierarchy for relation address as the total order (ordering is important).

define hierarchy location_hierarchy on address as [street,city,province_or_state,country]
 
Set-Grouping Concept Hierarchy:

define hierarchy age_hierarchy for age on customer as
     level1: {young, middle_aged, senior}< level0: all
     level2: {20, ..., 39} < level1: young
     level2: {40, ..., 59} < level1: middle_aged
     level2: {60, ..., 89} < level1: senior

Operation-Derived Concept Hierarchy:

define hierarchy age_hierarchy for age on customer as
    {age_category(1), ..., age_category(5)} := cluster (default, age, 5) < all(age)
 
Rule-Based Concept Hierarchy:

define hierarchy profit_margin_hierarchy on item  as
   level_1: low_profit_margin < level_0:  all
        if (price - cost)< $50
   level_1: medium-profit_margin < level_0:  all
       if ((price - cost) > $50)  and ((price - cost) <= $250)) 
   level_1: high_profit_margin < level_0:  all
       if (price - cost) > $250
 

Syntax for interestingness measure specification

The user can help control the number of uninteresting patterns returned by the data mining system by specifying measures of pattern interestingness and corresponding thresholds.

Interestingness measures include the confidence, support, noise, and novelty measures.

with [‹interest_measure_name›] threshold = ‹threshold_value›

Example: 

    with support threshold = 5%
    with confidence threshold = 70%


Syntax for pattern presentation and visualization specification

display as ‹result_form›

The result form can be rules, tables, cubes, crosstabs, pie or bar charts, decision trees, curves or surfaces.

To facilitate interactive viewing at different concept levels or different angles, the following syntax is defined:


‹Multilevel_Manipulation›::= roll up on ‹attribute_or_dimension› | drill down on ‹attribute_or_dimension› | add ‹attribute_or_dimension› | drop ‹attribute_or_dimension› 


Putting it all together: A DMQL query

Here is an example DMQL for AllElectronics Database
use database AllElectronics_db
use hierarchy location_hierarchy for B.address
mine characteristics as  customerPurchasing
analyze count%
in relevance to C.age, I.type, I.place_made
from customer C,  item I, purchases P, items_sold S, works_at W, branch
where I.item_ID = S.item_ID  and S.trans_ID = P.trans_ID
    and P.cust_ID = C.cust_ID and P.method_paid = ""AmEx"" 
    and P.empl_ID = W.empl_ID and W.branch_ID = B.branch_ID and B.address = ""Canada""  and I.price >= 100

with noise threshold = 0.05
display as table

This above query represents the whole Data Mining Query for a Database or Data Warehouse.  
  

Post a Comment

0 Comments