Star And Snowflake Schema In Data Warehousing

Star and Snowflake Schema In Data Mining

Before knowing Star and Snowflake Schema, check out about Multidimensional Data Model.

Star and Snowflake schemas are used to model the data warehouses.

Star Schema

It is a large central table (fact table) containing the bulk of the data, with no redundancy.

It is a set of smaller attendant tables (dimension tables) one for each dimension.

Here a fact table in the middle is connected to a set of dimension tables.

This schema graph resembles a starburst, with the dimension tables displayed in a radial pattern around the central fact table.


Star Schema 


Defining Star Schema In DMQL

To define a star schema in Data Mining Query Language, first, we need to define the whole cube and next, we need to define the individual dimensions. 
Here let's define star schema for Sales Table. 

" "define cube sales_star [time, item, branch, location]:
dollars_sold = sum(sales_in_dollars),
avg_sales = avg(sales_in_dollars),
units_sold = count(*)

define dimension time as (time_key, day, day_of_week, month, quarter, year)

define dimension item as (item_key, item_name, brand, type, supplier_type)

define dimension branch as (branch_key, branch_name, branch_type)

define dimension location as (location_key, street, city, province_or_state, country)" "

Snowflake Schema

This is a refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to a snowflake.

Snowflake model is normalized to reduce redundancies.

The table is easy to maintain and saves storage space.

The snowflake structure can reduce the effectiveness of browsing since more joins will be needed to execute a query.

Although the snowflake schema reduces redundancy, it is not as the star schema in data warehouse design.


Snowflake schema 


Defining Snowflake In DMQL

This definition is similar to that of sales_star, except that, here, the item and location dimension tables are normalized.

" "define cube sales_snowflake [time, item, branch, location]:
   dollars_sold = sum(sales_in_dollars),
   avg_sales = avg(sales_in_dollars), units_sold = count(*)

define dimension time as (time_key, day, day_of_week, month, quarter, year)

define dimension item as (item_key, item_name, brand, type, supplier(supplier_key, supplier_type))

define dimension branch as (branch_key, branch_name, branch_type)

define dimension location as (location_key, street, city(city_key, province_or_state, country))" "
 

Summary

Star Schema -> It is a large central table (fact table) containing the bulk of the data, with no redundancy.

Snowflake Schema -> This is a refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to a snowflake.

Subscribe us for more content on Data.  


Post a Comment

0 Comments