You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

2.2 Dimensional Modeling and Star Schema

3 min readjuly 18, 2024

Dimensional modeling and star schemas are crucial tools in data warehousing. They organize data into facts and dimensions, making it easier to analyze and query large datasets. This approach optimizes read-heavy workloads, enabling fast performance for business intelligence tasks.

The , with its central and surrounding dimension tables, is the most common implementation. It denormalizes data for speed, sacrificing some storage efficiency. This design allows for intuitive data exploration and quick analysis across multiple dimensions.

Dimensional Modeling and Star Schema

Concepts of dimensional modeling

Top images from around the web for Concepts of dimensional modeling
Top images from around the web for Concepts of dimensional modeling
  • Technique used to design data warehouses for efficient querying and analysis
    • Organizes data into facts (measurable, quantitative data) and dimensions (context and descriptive )
    • Optimized for read-heavy workloads and business intelligence (reporting, dashboards, ad-hoc queries)
  • Enables fast query performance by denormalizing data (redundant data in dimension tables to avoid joins)
  • Provides an intuitive and business-friendly data model (easy for users to understand and navigate)
  • Allows for easy slicing and dicing of data across multiple dimensions (time, product, location, customer)

Components of star schema

  • Fact table
    • Central table contains measurable, quantitative data about a business process (sales, inventory, website clicks)
    • Includes foreign keys to dimension tables and numeric (sales_amount, quantity_sold)
    • Example: sales_fact table with measures (sales_amount, quantity_sold) and foreign keys (time_key, product_key, store_key)
  • Dimension tables
    • Provide context and descriptive attributes for the facts (product name, store location, customer demographics)
    • Connected to the fact table through foreign keys
    • Denormalized and contain redundant data to avoid joins (duplicate data in dimension tables for faster queries)
    • Common dimension tables:
      • Time dimension represents different hierarchical levels of time (day, month, quarter, year)
      • Product dimension includes attributes (product name, category, brand, price)
      • Location dimension represents geographical (store, city, state, country)
      • Customer dimension contains customer attributes (name, age, gender, segment)

Star schema design process

  1. Identify the business process to be modeled (sales, inventory, marketing)
  2. Determine the grain (level of detail) of the fact table (transaction level, daily aggregation)
  3. Identify the measures for the fact table (sales_amount, quantity_sold, discount_amount)
  4. Identify the dimensions and their attributes (time, product, store, customer)
  5. Create the fact table with measures and foreign keys to dimension tables
  6. Create the dimension tables with their attributes and primary keys
  • Example scenario: Designing a for retail sales analysis
    • Fact table: sales_fact with measures (sales_amount, quantity_sold, discount_amount) and foreign keys (time_key, product_key, store_key, customer_key)
    • Dimension tables:
      • time_dim (date, day, month, quarter, year)
      • product_dim (product_key, product_name, category, brand, price)
      • store_dim (store_key, store_name, city, state, country)
      • customer_dim (customer_key, customer_name, age, gender, segment)

Star schema vs other techniques

  • Star schema
    • Most common and simplest dimensional modeling technique
    • Consists of a fact table surrounded by dimension tables
    • Denormalized structure with redundant data in dimension tables (faster queries but more storage)
    • Variation of star schema with normalized dimension tables
    • Dimension tables are further divided into subdimensions (product category and subcategory tables)
    • Reduces data redundancy but increases query complexity (more joins needed)
  • (galaxy schema)
    • Multiple fact tables sharing dimension tables
    • Useful for complex business scenarios with multiple business processes (sales and inventory facts sharing product and store dimensions)
    • Allows for complex analysis across different fact tables
  • Comparison:
    • Star schema: simpler, easier to understand, better query performance
    • Snowflake schema: more normalized, reduces redundancy, increases query complexity
    • Constellation schema: suitable for complex scenarios with multiple fact tables and shared dimensions
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary