Understanding Snowflake and Star Schema
Snowflake Schema
It has more than one fact table used to avoid complexity and to create a more normalized structure.
Star Schema
It consists of a single fact table connected to dimension table visualized as a Star. Single link establishes between the fact table and dimension table.Snowflake Schema extension of Star Schema?
- Large Dimension Tables normalized into multiple sub dimensional tables.
- Every dimension table associated with the sub a dimension table and has multiple links.
- A Snowflake schema is a Star schema structure normalized through the use of outrigger tables. i.e., the dimension table hierarchies broken into more unadorned tables.
Challenge for Implementing Storage and Query Platform
In the world of Data warehouse, storage and query performance optimization are significant concerns. Snowflake schema builds Data Warehouse, and as a result, query on the Data Warehouse results in lots of joins. The major challenges include-- Understand the schema and then denormalize the lookup tables to reduce the number of joins in the query.
- To implement Data Migration into the new data warehouse to give the response in seconds because of slow aggregation queries.
Solution Offerings for Data Warehouse and Data Migration
- Denormalization - Understand schema from conceptual models to physical models followed by the granularity of Snowflake schema.
- Data Migration - For latency reduction of aggregate queries, migrate data to Druid data warehouse defining aggregation policy for every table.
Getting Started with Data Warehousing
Data WareHouse stores massive data , central repository of information analysed to make better informed decisions. Latest Data WareHouse architectures include -- Amazon RedShift
- Google BigQuery
- Panoply
Data WareHouse Characteristics
- Highly Reliable
- Data Integrity
- Better Storage Performance
- Faster Sequential Reads
- Elimination of Physical Hardware
- Massive Parallel Processing
Data WareHouse Classification
- ETL (Extract, Transform, Load) Processes - Data Warehousing tunes the ETL processing to increase performance and reduce load time.
- Query Processing - Query Optimisation by understanding query execution in database, aggregate tables, index usage, Vertical and Horizontal Partitioning, Denormalization, server tuning.
- Delivering Reports - Network traffic, server setups delay in delivery of reports. Implement Performance Tuning to avoid this.