Audience Segmentation and Cohorts Questions

Covers methods for dividing users or consumers into meaningful segments and analyzing their behavior over time using cohort analysis. Candidates should be able to choose segmentation dimensions such as demographics, acquisition channel, product usage, geography, device, or behavioral attributes, and justify those choices for a given business question. They should know how to design cohort analyses to measure retention, churn, lifetime value, and conversion funnels, and how to avoid common pitfalls such as Simpson's Paradox and survivorship bias. This topic also includes deriving behavioral insights to inform personalization, content and product strategy, marketing targeting, and persona development, as well as identifying underserved or high value segments. Expect discussion of relevant metrics, data requirements and quality considerations, approaches to visualization and interpretation, and typical tools and techniques used in analytics and experimentation to validate segment driven hypotheses.

MediumTechnical

0 practiced

Explain Simpson's Paradox with a concrete example in the context of segmentation, e.g., overall conversion rate increases while conversion declines within major acquisition channels. How would you detect Simpson's Paradox in your analyses and what practices would you adopt to avoid misleading aggregated reporting?

EasyTechnical

0 practiced

List the minimum data elements and schema requirements you need to run a reliable cohort analysis. Include fields like primary keys, timestamps, event names, identity resolution keys, and metadata (device, channel). Explain why each element is necessary and how you would handle missing or ambiguous user identifiers.

MediumTechnical

0 practiced

Given table events(user_id, event_name, event_timestamp, acquisition_channel) in Postgres, write a SQL query that builds a weekly cohort retention matrix. Rows should be cohort_week (week of user's first 'signup'), columns should be week_index since signup (0,1,2,...), and cells should show percent of cohort users active in that subsequent week. Explain assumptions about counting "active" and handling timezone normalization.

MediumTechnical

0 practiced

Describe an approach (SQL or Python) to turn raw event logs into user sessions using a 30-minute inactivity timeout. Given events(user_id, event_timestamp, event_name) explain the algorithm, sample SQL window-function or pandas code sketch, edge cases such as out-of-order events or missing timestamps, and how to compute session-level metrics like session_length and page_views_per_session.

HardTechnical

0 practiced

Implement in Python (pandas) a function that computes weekly rolling retention for cohorts. Input: DataFrame events(user_id, event_timestamp ISO string, event_name). Output: DataFrame with cohort_week_start, week_index, retention_rate. In your description mention timezone normalization choices, reference date selection, and performance considerations for large datasets (e.g., chunking or out-of-core libraries).

Unlock Full Question Bank

Get access to hundreds of Audience Segmentation and Cohorts interview questions and detailed answers.

Join thousands of developers preparing for their dream job.