Error Handling and Code Quality Questions

Focuses on writing production quality code and scripts that are defensive, maintainable, and fail gracefully. Covers anticipating and handling failures such as exceptions, missing files, network errors, and process exit codes; using language specific constructs for error control for example try except blocks in Python or set minus e patterns in shell scripts; validating inputs; producing clear error messages and logs; and avoiding common pitfalls that lead to silent failures. Also includes code quality best practices such as readable naming and code structure, using standard libraries instead of reinventing functionality, writing testable code and unit tests, and designing for maintainability and observability.

MediumTechnical

0 practiced

Demonstrate idiomatic Go error wrapping and checking: write a Go code snippet that wraps an os.IsNotExist error with context using fmt.Errorf and %w, and then shows how to check the underlying error with errors.Is and extract with errors.As. Explain why this pattern is useful for SREs writing libraries and services.

MediumTechnical

0 practiced

Given a Flask application that calls downstream services, implement a middleware or error handler that translates downstream exceptions into appropriate HTTP responses. Requirements: 5xx internal errors should return a generic message, 4xx client errors should be passed through with sanitized messages, and logs must include correlation id and downstream error code. Provide a code sketch and describe how to instrument metrics for these translated errors.

EasyTechnical

0 practiced

You observe ephemeral subprocesses and file descriptors leaking when jobs fail. As an SRE, describe code patterns to avoid resource leaks in scripts and services. Provide concrete examples: use of Python context managers, finally blocks, proper subprocess termination, process groups, and closing sockets. Show how to ensure cleanup runs even under KeyboardInterrupt or fatal error scenarios.

HardTechnical

0 practiced

Design a fail-fast strategy for a distributed system where misconfiguration can cause data corruption. Discuss detection mechanisms, pre-deployment checks, admission controllers, feature flags, runtime guards, and a runbook for enabling fail-fast behavior versus graceful degradation. Include trade-offs between safety and availability.

MediumSystem Design

0 practiced

Design an automated rollback strategy for a microservice deployment when its error rate breaches the SLO. Include detection windows, metric thresholds, integration with CI/CD, safe rollback steps, canary considerations, and how to avoid rollback flapping. Describe which components own the rollback decision and how human override is handled.

Unlock Full Question Bank

Get access to hundreds of Error Handling and Code Quality interview questions and detailed answers.

Join thousands of developers preparing for their dream job.