Data Science & Analytics Topics
Statistical analysis, data analytics, big data technologies, and data visualization. Covers statistical methods, exploratory analysis, and data storytelling.
Exploratory Data Analysis
Exploratory Data Analysis is the systematic process of investigating and validating a dataset to understand its structure, content, and quality before modelling or reporting. Core activities include examining schema and data types, computing descriptive statistics such as counts, means, medians, standard deviations and quartiles, and measuring class balance and unique value counts. It covers distribution analysis, outlier detection, correlation and relationship exploration, and trend or seasonality checks for time series. Data validation and quality checks include identifying missing values, anomalies, inconsistent encodings, duplicates, and other data integrity issues. Practical techniques span SQL profiling and aggregation queries using GROUP BY, COUNT and DISTINCT; interactive data exploration with pandas and similar libraries; and visualization with histograms, box plots, scatter plots, heatmaps and time series charts to reveal patterns and issues. The process also includes feature summary and basic metric computation, sampling strategies, forming and documenting hypotheses, and recommending cleaning or transformation steps. Good Exploratory Data Analysis produces a clear record of findings, assumptions to validate, and next steps for cleaning, feature engineering, or modelling.
Analytical Background
The candidate's analytical skills and experience with data driven problem solving, including statistics, data analysis projects, tools and languages used, and examples of insights that influenced product or business decisions. This covers academic projects, internships, or professional analytics work and the end to end approach from hypothesis to measured result.
Attribution Modeling and Multi Touch Attribution
Covers the theory and practice of assigning credit for conversions across marketing touchpoints. Candidates should know single touch models such as first touch and last touch, deterministic multi touch models like linear and time decay, and algorithmic or data driven models that use statistical or machine learning techniques. Discuss the pros and cons of each approach including bias introduced by simple models, the data and engineering requirements for algorithmic models, and trade offs between interpretability and accuracy. Topics include model selection aligned to business questions, dealing with long purchase cycles, cross device and cross channel journeys, limitations of deterministic attribution, approaches to model validation, and how attribution differs from causal incrementality testing.
Audience Segmentation and Cohorts
Covers methods for dividing users or consumers into meaningful segments and analyzing their behavior over time using cohort analysis. Candidates should be able to choose segmentation dimensions such as demographics, acquisition channel, product usage, geography, device, or behavioral attributes, and justify those choices for a given business question. They should know how to design cohort analyses to measure retention, churn, lifetime value, and conversion funnels, and how to avoid common pitfalls such as Simpson's Paradox and survivorship bias. This topic also includes deriving behavioral insights to inform personalization, content and product strategy, marketing targeting, and persona development, as well as identifying underserved or high value segments. Expect discussion of relevant metrics, data requirements and quality considerations, approaches to visualization and interpretation, and typical tools and techniques used in analytics and experimentation to validate segment driven hypotheses.
Dashboard and Data Visualization Design
Principles and practices for designing, prototyping, and implementing visual artifacts and interactive dashboards that surface insights and support decision making. Topics include information architecture and layout, chart and visual encoding selection for comparisons trends distributions and relationships, annotation and labeling, effective use of color and white space, and trade offs between overview and detail. The topic covers interactive patterns such as filters drill downs tooltips and bookmarks and decision frameworks for when interactivity adds user value versus complexity. It also encompasses translating analytic questions into metrics grouping related measures, wireframing and prototyping, performance and data latency considerations for large data sets, accessibility and mobile responsiveness, data integrity and maintenance, and how statistical concepts such as statistical significance confidence intervals and effect sizes influence visualization choices.
Netflix-Specific Data Analysis Scenarios
Netflix-specific data analysis scenarios covering streaming metrics, user engagement and retention analysis, content consumption patterns, evaluation of recommendation systems, A/B test design and analysis, cohort analysis, data visualization, and storytelling with data in the streaming domain.
Central Limit Theorem (CLT) and Normal Distribution
Understand the CLT: when you take multiple random samples and calculate their means, those sample means are normally distributed (bell-shaped) even if the underlying data isn't. Know that normal distribution is parameterized by mean and standard deviation. Appreciate why this matters: it allows you to estimate population characteristics from samples and construct confidence intervals.
Real World Experimental Challenges and Solutions
Discuss practical complications in running experiments at scale: user heterogeneity, segment-specific effects, long-term vs. short-term metrics, novelty effects, network effects, and infrastructure constraints. Know techniques for variance reduction (CUPED), segmentation strategies, and how to detect and correct for data quality issues during experiments.
Metrics, Guardrails, and Evaluation Criteria
Design appropriate success metrics for experiments. Understand primary metrics, secondary metrics, and guardrail metrics. Know how to choose metrics that align with business goals while avoiding unintended consequences.