Your SRE Background and Experience Questions
Articulate your hands-on experience with systems administration, monitoring tools, automation scripts, and any incident response involvement. Be specific about technologies (e.g., Prometheus, Grafana, Kubernetes, Docker, Terraform) and concrete examples of what you've built or fixed.
HardTechnical
0 practiced
A JVM-based production service shows steadily increasing heap usage under sustained load. You suspect a memory leak. Detail the investigation steps using GC logs, heap dumps, async-profiler or Java Flight Recorder, and Prometheus JVM metrics. Include commands to capture diagnostics, sampling strategy to limit production impact, and immediate mitigations while you investigate.
MediumSystem Design
0 practiced
Design a complete monitoring and alerting solution for a Kubernetes microservices platform that serves 10,000 requests per second. Describe Prometheus architecture options (federation, sharding, remote_write), scrape patterns and relabel_configs to control cardinality, Grafana dashboards and key panels, Alertmanager routing and silences, and which core metrics and alerts you'd prioritize for SREs.
HardTechnical
0 practiced
You must perform capacity planning for a new service expected to grow 10x over 12 months. Describe the data you would collect (traffic patterns, per-request CPU/memory/IO), how you would model demand and headroom, how to size infrastructure and estimate costs, and a validation plan that includes load testing and chaos experiments. What KPIs would you present to stakeholders?
MediumTechnical
0 practiced
Less than 1% of requests are suffering from very high tail latency. Explain how you would locate and diagnose this issue: what Prometheus queries, tracing sampling strategies, logs, and profiling steps would you use? Describe how you'd reproduce, validate fixes, and measure improvements for p95/p99/p999 latency.
EasyTechnical
0 practiced
Describe at least two approaches you have used to manage secrets in cloud deployments (for example: HashiCorp Vault, AWS KMS + Parameter Store, Kubernetes Secrets). For each approach explain how secrets are stored, how applications retrieve them, rotation strategies, and operational trade-offs around security and complexity.
Unlock Full Question Bank
Get access to hundreds of Your SRE Background and Experience interview questions and detailed answers.
Sign in to ContinueJoin thousands of developers preparing for their dream job.