Machine Learning Algorithms and Theory Questions

Core supervised and unsupervised machine learning algorithms and the theoretical principles that guide their selection and use. Covers linear regression, logistic regression, decision trees, random forests, gradient boosting, support vector machines, k means clustering, hierarchical clustering, principal component analysis, and anomaly detection. Topics include model selection, bias variance trade off, regularization, overfitting and underfitting, ensemble methods and why they reduce variance, computational complexity and scaling considerations, interpretability versus predictive power, common hyperparameters and tuning strategies, and practical guidance on when each algorithm is appropriate given data size, feature types, noise, and explainability requirements.

MediumSystem Design

0 practiced

Design a low-latency, highly available model serving architecture for ensemble models (random forest or gradient-boosted trees) to support 10,000 requests/sec with 95th percentile latency <100ms. Describe components for model storage, inference servers, autoscaling, caching, feature retrieval, batching tradeoffs, model versioning, and CI/CD considerations for deployments.

EasyTechnical

0 practiced

Implement a closed-form linear regression solver in Python. Create a function `fit_closed_form(X, y, l2=0.0)` where X is a 2D numpy array of shape (n_samples, n_features) and y is a 1D numpy array. Return tuple `(coef, intercept)`. Support optional L2 regularization (ridge) using `l2` parameter and handle singular or near-singular X'X matrices robustly. Explain algorithmic complexity and when this approach is appropriate in production.

HardTechnical

0 practiced

Describe methods to calibrate probabilistic outputs of classifiers: Platt scaling, isotonic regression, and temperature scaling for neural nets. Given non-stationary data that drifts over time, propose a deployable strategy to maintain well-calibrated probabilities in production.

MediumTechnical

0 practiced

Implement functions in Python to compute ROC and Precision–Recall (PR) curves from `y_true` and `y_scores` without using sklearn. Return arrays of thresholds and corresponding (TPR, FPR) for ROC and (precision, recall) for PR, and compute AUCs. Discuss how ties in scores affect both curves and numerical stability considerations.

MediumTechnical

0 practiced

Explain why bagging (e.g., random forest) tends to reduce variance whereas boosting (e.g., gradient boosting) focuses on reducing bias. Provide a concise mathematical intuition and discuss practical implications in production, including latency, model complexity, and risk of overfitting.

Unlock Full Question Bank

Get access to hundreds of Machine Learning Algorithms and Theory interview questions and detailed answers.

Join thousands of developers preparing for their dream job.