InterviewStack.io LogoInterviewStack.io

Python Programming & ML Libraries Questions

Python programming language fundamentals (syntax, data structures, control flow, error handling) with practical usage of machine learning libraries such as NumPy, pandas, scikit-learn, TensorFlow, and PyTorch for data manipulation, model development, training, evaluation, and lightweight ML tasks.

HardTechnical
0 practiced
Explain numerical stability issues when implementing softmax and cross-entropy from scratch in NumPy for large logits. Provide stable implementations of softmax and log-softmax and show how to compute cross-entropy loss in a numerically stable way. Briefly discuss the Jacobian of softmax and common operations to avoid overflow/underflow.
EasyTechnical
0 practiced
Describe a recommended procedure to set random seeds for reproducibility across Python's random, NumPy, PyTorch, and TensorFlow. Include device-specific concerns (GPU), cuDNN deterministic flags, and discuss trade-offs between reproducibility and performance or parallelism.
MediumTechnical
0 practiced
You must ingest a 50GB CSV on a machine with 8GB RAM and produce a columnar Parquet dataset ready for model training. Describe a Python ingestion pipeline using pandas (chunking), efficient dtype assignment, categorical casting, streaming aggregations, and writing partitioned Parquet files. Provide code sketches for chunked read, per-chunk transforms, and atomic writes and explain how you would ensure schema consistency and handle partial failures.
EasyTechnical
0 practiced
Explain the difference between shallow copy and deep copy in Python. In machine-learning contexts, discuss consequences for large NumPy arrays, pandas objects, and model parameter references. When might copy-on-write behavior or view vs copy nuances matter in preprocessing pipelines or when sharing data across threads/processes?
EasyTechnical
0 practiced
Write a concise pandas expression to one-hot encode the categorical column df['color'] and drop the first level to avoid multicollinearity. Show how to keep the result in df (i.e., drop original column and add dummies). Also mention how to persist the encoding mapping for production inference to handle unseen categories.

Unlock Full Question Bank

Get access to hundreds of Python Programming & ML Libraries interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.