Core supervised and unsupervised machine learning algorithms and the theoretical principles that guide their selection and use. Covers linear regression, logistic regression, decision trees, random forests, gradient boosting, support vector machines, k means clustering, hierarchical clustering, principal component analysis, and anomaly detection. Topics include model selection, bias variance trade off, regularization, overfitting and underfitting, ensemble methods and why they reduce variance, computational complexity and scaling considerations, interpretability versus predictive power, common hyperparameters and tuning strategies, and practical guidance on when each algorithm is appropriate given data size, feature types, noise, and explainability requirements.
EasyTechnical
0 practiced
Explain Principal Component Analysis (PCA): derivation via covariance eigendecomposition and SVD, how PCA reduces dimensionality, how to compute explained variance ratio, when to center/scale features, and the difference between PCA (projection) and feature selection methods in terms of interpretability and information retention.
EasyTechnical
0 practiced
Compare L1 (Lasso) and L2 (Ridge) regularization for linear models. Explain how each penalty affects coefficients (sparsity, shrinkage), behavior with correlated features, numerical/optimization considerations, and give guidance for choosing between L1, L2, and Elastic Net on a high-dimensional correlated dataset in production.
MediumTechnical
0 practiced
Implement a linear SVM trainer using SGD in Python with hinge loss and optional L2 regularization. Provide a class `LinearSVM` with `fit(X, y, lr=0.01, epochs=10, batch_size=64, C=1.0)` and `predict(X)` methods. Use y in {-1, +1}. Include shuffling and mini-batch updates for convergence.
MediumTechnical
0 practiced
Case study: CTR prediction for a large e-commerce site with sparse categorical features like user_id and item_id each with millions of unique values. Propose feature engineering strategies (hashing, embeddings), candidate model families (logistic, factorization machines, deep models), cold-start handling, online embedding serving, and low-latency serving strategies.
MediumTechnical
0 practiced
Implement logistic regression with L2 regularization using batch gradient descent in Python. Function signature: `def logistic_regression_train(X_train, y_train, X_val=None, y_val=None, lr=0.01, reg=1.0, max_iter=1000, tol=1e-6):` Return weights and training/validation loss history. Use a numerically stable log-loss and implement early stopping based on validation loss.
Unlock Full Question Bank
Get access to hundreds of Machine Learning Algorithms and Theory interview questions and detailed answers.