Data Consistency and Distributed Transactions Questions

In depth focus on data consistency models and practical approaches to maintaining correctness across distributed components. Covers strong consistency models including linearizability and serializability, causal consistency, eventual consistency, and the implications of each for replication, latency, and user experience. Discusses CAP theorem implications for consistency choices, idempotency, exactly once and at least once semantics, concurrency control and isolation levels, handling race conditions and conflict resolution, and concrete patterns for coordinating updates across services such as two phase commit, three phase commit, and the saga pattern with compensating transactions. Also includes operational challenges like retries, timeouts, ordering, clocks and monotonic timestamps, trade offs between throughput and consistency, and when eventual consistency is acceptable versus when strong consistency is required for correctness (for example financial systems versus social feeds).

MediumTechnical

0 practiced

Design deduplication and ordering guarantees for a CDC pipeline using Debezium to stream changes from an OLTP DB to a data lake. The pipeline must not apply the same DB change twice across connector restarts and must preserve transactional ordering. Explain how to use source LSN/transaction id, idempotent sink writes, connector offsets, and any consumer-side bookkeeping required.

EasyTechnical

0 practiced

Explain what monotonic timestamps are and why they matter for event ordering in a streaming pipeline. Propose an approach to enforce monotonic timestamps at ingestion when clients have unsynchronized clocks (e.g., server-assigned timestamps, HLCs, or hybrid logical clocks), and discuss trade-offs such as added latency, skew handling, and impact on downstream sorting.

MediumTechnical

0 practiced

As a senior data engineer, you must lead the team decision between adopting sagas or distributed transactions (e.g., 2PC) for cross-service updates. Describe the evaluation criteria you would use (SLOs, failure modes, operational complexity, throughput, recovery time), experiments or proof-of-concepts you'd run, and how you'd present a recommendation and migration plan to engineering and product stakeholders.

MediumTechnical

0 practiced

Late-arriving events are causing daily aggregates to be incorrect in your streaming analytics pipeline. Describe a production-grade design to handle late arrivals: include event-time vs processing-time semantics, watermarks, allowed lateness, state retention windows, retractions or corrective updates to aggregates, and trade-offs affecting low-latency dashboards.

MediumTechnical

0 practiced

A dedup store holds per-event idempotency keys to enforce exactly-once for a streaming sink. Propose a safe garbage collection strategy that reclaims keys without risking reaccepting late-arriving or retried events incorrectly. Include TTL calculation relative to watermarks, watermark alignment across nodes, and mechanisms to coordinate GC in a distributed deployment.

Unlock Full Question Bank

Get access to hundreds of Data Consistency and Distributed Transactions interview questions and detailed answers.

Join thousands of developers preparing for their dream job.