Incident Classification and Severity Questions

Focuses on structured approaches to classifying incidents and assigning severity levels to drive appropriate response, escalation, and communication. Covers defining severity criteria based on customer impact, affected services, scope of impact, and regulatory concerns, mapping severity to response playbooks and on call rotations, establishing escalation paths and communication cadences, defining service level objectives and response time targets, coordinating cross functional responders, and creating runbooks and automated tooling to enforce the framework. Also includes governance topics such as reviewing and refining severity definitions from post incident analyses, training responders on the framework, and adjusting thresholds to reduce false positives and ensure consistent prioritization.

EasyTechnical

0 practiced

List and describe the core attributes you would use to define incident severity in an enterprise environment. For each attribute (customer impact, scope/number of users, durability/data-loss risk, regulatory exposure, mitigation complexity), give at least one concrete observable or metric example that could be used to measure it (e.g., percent of requests 5xx, revenue-per-minute affected, number of customers reporting failures).

HardTechnical

0 practiced

During a high-severity incident you must determine if data integrity was compromised. Describe forensic steps: which logs and backups to collect and preserve, sampling strategies to validate data integrity, methods to verify consistency across replicas, how to preserve chain-of-custody for legal/compliance, timelines for reporting potential data loss, and how to coordinate findings with security and legal teams.

HardTechnical

0 practiced

Evaluate trade-offs between centralizing severity classification (single platform and decision team) versus decentralizing it to individual service owners. Consider detection speed, local context awareness, consistency, governance overhead, and scalability. Recommend a hybrid approach that combines centralized guardrails with local autonomy, and describe its governance model, escalation flow, and enforcement mechanisms.

HardTechnical

0 practiced

Write pseudocode or Python for an alert deduplication and correlation engine that consumes alert events, groups alerts by service and root_cause tag within a 5-minute sliding window, and emits consolidated incidents with an aggregated severity computed from grouped alerts. Emphasize correctness and clarity; then explain algorithmic complexity, eventual consistency concerns, and how you would test this logic under high throughput.

EasyTechnical

0 practiced

Describe how regular 'game days' or runbook testing exercises help validate and calibrate incident severity definitions. Provide a concrete test scenario (e.g., multi-service degrade during peak traffic), metrics to measure success (time-to-declare, judgement accuracy), and how to incorporate lessons learned into severity thresholds and runbooks.

Unlock Full Question Bank

Get access to hundreds of Incident Classification and Severity interview questions and detailed answers.

Join thousands of developers preparing for their dream job.