InterviewStack.io LogoInterviewStack.io

Scikit Learn, Pandas, and NumPy Usage Questions

Practical proficiency with these core libraries. Pandas: DataFrames, data manipulation, handling missing values. NumPy: arrays, vectorized operations, mathematical functions. Scikit-learn: preprocessing, model fitting, evaluation metrics, pipelines. Knowing standard patterns and APIs. Writing efficient, readable code using these libraries.

MediumTechnical
0 practiced
Describe how you'd build an automated daily Excel report from a pandas DataFrame that includes multiple pivot tables on different sheets, styling (bold headers, number formatting), and email delivery. Provide sample code using pd.ExcelWriter and explain scheduling considerations.
EasyTechnical
0 practiced
You have a DataFrame in wide format with quarterly sales columns: ['region', 'Q1_sales', 'Q2_sales', 'Q3_sales', 'Q4_sales']. Write pandas code to convert this to long format with columns ['region', 'quarter', 'sales'] using pd.melt. Explain id_vars and value_vars and when melt is preferred over stack/pivot.
MediumTechnical
0 practiced
Design a scikit-learn pipeline using ColumnTransformer to handle a dataset with mixed features: numeric_cols = ['age','income'], categorical_cols = ['region','plan'], with missing values. Pipeline should:- Impute numeric with median and scale- Impute categorical with 'missing' and OneHotEncode (handle_unknown='ignore')- Fit a RandomForestClassifier on processed featuresProvide complete Python code for the pipeline.
EasyTechnical
0 practiced
Using Python and pandas, write concise code to load a CSV file named 'sales.csv' with columns: order_id, customer_id, amount, order_date (format YYYY-MM-DD). Requirements:1) Read the CSV parsing order_date as datetime and minimizing memory usage where reasonable.2) Filter orders placed in 2024 with amount > 100.3) Select columns customer_id and amount and compute total and average amount per customer.4) Return the top 5 customers by total amount. Include short explanation of your choices.
EasyTechnical
0 practiced
You have a pandas DataFrame 'df' with columns: user_id, event_time (datetime), metric (float) and some NaNs in metric. Write Python code to:1) Impute missing metric values with the median metric for that user's historical values.2) If a user has no historical non-null values, fill with the global median.3) After imputation, forward-fill within each user when ordering by event_time for any remaining NaNs.Explain edge-case handling.

Unlock Full Question Bank

Get access to hundreds of Scikit Learn, Pandas, and NumPy Usage interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.