InterviewStack.io LogoInterviewStack.io

Infrastructure Implementation and Operations Questions

Hands on design, deployment, and operational management of infrastructure components and services. This includes setting up and configuring load balancers, database replication and high availability, caching layers, networking and network security, service discovery and routing, container deployment and orchestration, monitoring and observability, logging and alerting, backup and disaster recovery strategies, and secrets management in runtime. Candidates should be able to walk through concrete implementations, explain trade offs, demonstrate troubleshooting and performance tuning, and show how infrastructure components integrate to meet availability, scalability, and security requirements.

MediumSystem Design
0 practiced
Design a backup and point-in-time recovery (PITR) strategy for a cloud data warehouse (Snowflake or BigQuery) that retains 3 years of history but must support restoring any dataset to a point in the last 7 days within 2 hours. Describe snapshots, incremental backups, and restore workflow.
HardTechnical
0 practiced
Implement a Terraform module conceptually that provisions an AWS Auto Scaling Group for Spark worker nodes using a mixed instances policy (spot + on-demand) and lifecycle hooks to gracefully drain Spark executors before instance termination. Describe the module inputs, outputs, and key resource blocks you would include (IAM, ASG, launch template, lifecycle hook).
HardTechnical
0 practiced
Design a secure Kubernetes cluster for processing sensitive PII: include network policies, Pod Security admission (or OPA/Gatekeeper), secrets handling, RBAC scoping, encryption at rest, audit logging, and multi-tenant isolation strategies while minimizing operational overhead for the data engineering team.
HardSystem Design
0 practiced
Design an exactly-once, low-latency cross-region replication system for a Change Data Capture (CDC) stream (e.g., Debezium) replicating OLTP changes from region A to an analytics cluster in region B. Address ordering, deduplication, network partitions, failure modes, and how to validate correctness.
HardSystem Design
0 practiced
Design a hybrid multi-cloud data platform: primary workloads in GCP, disaster recovery in AWS, and specialized compute in Azure. Discuss networking (interconnects, VPNs), identity federation, cross-cloud data replication, deployment automation, and how to test failover end-to-end.

Unlock Full Question Bank

Get access to hundreds of Infrastructure Implementation and Operations interview questions and detailed answers.

Sign in to Continue

Join thousands of developers preparing for their dream job.