Error Handling and Defensive Programming Questions

Covers designing and implementing defensive, fault tolerant code and system behaviors to prevent and mitigate production failures. Topics include input validation and sanitization, null and missing data handling, overflow and boundary protections, exception handling and propagation patterns, clear error reporting and structured logging for observability, graceful degradation and fallback strategies, retry and backoff policies and idempotency for safe retries. Also address concurrency and synchronization concerns, resource and memory management to avoid exhaustion, security related input checks, and how to document and escalate residual risks. Candidates should discuss pragmatic trade offs between robustness and complexity, show concrete defensive checks and assertions, and describe test strategies for error paths including unit tests and integration tests and how monitoring and operational responses tie into robustness.

HardTechnical

0 practiced

During large-scale training jobs you occasionally get CUDA OOM errors. Describe a systematic runtime strategy to detect impending OOM, mitigate it safely, and continue training where possible. Include automatic batch-size reduction, checkpointing, graceful abort with diagnostics, and other runtime safeguards.

MediumTechnical

0 practiced

How would you document and escalate residual risks after deploying defensive measures for an ML system (e.g., remaining single points of failure, partial data loss scenarios)? Describe the contents of a risk register entry and the process to escalate to SRE or product teams.

MediumTechnical

0 practiced

Write a small Python implementation (or clear pseudocode) of an exponential backoff with full jitter strategy for retries. The implementation should accept parameters: initial_delay, max_delay, attempt_number, and return a randomized delay suitable for use in client-side retry logic.

EasyTechnical

0 practiced

Design a minimal structured logging scheme for ML inference requests that supports observability and debugging of errors. Specify required fields (timestamp, level, correlation_id, model_version, input_checksum, error_code, message), an example JSON record, and explain how structured logs help in automated alerting and log-based monitoring.

MediumTechnical

0 practiced

You have an external inference service that sometimes fails transiently. Design a retry and idempotency strategy for an ML client that must avoid duplicate downstream side-effects (e.g., billing events). Describe how to generate idempotency keys, where to store them, TTL considerations, and how to handle concurrent retries from multiple clients.

Unlock Full Question Bank

Get access to hundreds of Error Handling and Defensive Programming interview questions and detailed answers.

Join thousands of developers preparing for their dream job.