Cloud Data Warehouse Design and Optimization Questions

Covers design and optimization of analytical systems and data warehouses on cloud platforms. Topics include schema design patterns for analytics such as star schema and snowflake schema, purposeful denormalization for query performance, column oriented storage characteristics, distribution and sort key selection, partitioning and clustering strategies, incremental loading patterns, handling slowly changing dimensions, time series data modeling, cost and performance trade offs in cloud managed warehouses, and platform specific features that affect query performance and storage layout. Candidates should be able to discuss end to end design considerations for large scale analytic workloads and trade offs between latency, cost, and maintainability.

EasyTechnical

0 practiced

Describe the primary differences between OLTP and OLAP systems. In the context of a cloud data warehouse, explain why design choices such as indexing, normalization, and transaction optimization differ from those in online transactional databases.

HardTechnical

0 practiced

Implement an incremental merge pattern using Spark Structured Streaming and Delta Lake (pseudocode acceptable). Given schema: events(id STRING, user_id STRING, event_time TIMESTAMP, value DOUBLE). Describe how you deduplicate, handle late-arriving events, and maintain exactly-once semantics for upserts into a Delta table.

HardTechnical

0 practiced

A fact table shows severe join skew because a few hot keys dominate join cardinality, causing massive shuffles and long runtimes. Propose approaches to mitigate join skew at the warehouse or ETL level, including rekeying, salting, pre-aggregation, and use of broadcast joins or colocated joins.

MediumTechnical

0 practiced

Write a SQL query (specify dialect) to compute Monthly Active Users (MAU) from an events table partitioned by event_date. The table schema: events(user_id STRING, event_time TIMESTAMP, event_date DATE). Compute unique users per calendar month for the past 6 months and ensure the query takes advantage of partition pruning.

MediumSystem Design

0 practiced

You store daily Parquet files partitioned by dt in S3 and query them with Athena/Glue. Describe an optimal partitioning and file-sizing strategy to minimize query latency and cost. Discuss use of partition pruning, Glue catalog partitions, and cost of too many small files versus too-large files.

Unlock Full Question Bank

Get access to hundreds of Cloud Data Warehouse Design and Optimization interview questions and detailed answers.

Join thousands of developers preparing for their dream job.