Real Time and Batch Ingestion Questions

Focuses on choosing between batch ingestion and real time streaming for moving data from sources to storage and downstream systems. Topics include latency and throughput requirements, cost and operational complexity, consistency and delivery semantics such as at least once and exactly once, idempotent and deduplication strategies, schema evolution, connector and source considerations, backpressure and buffering, checkpointing and state management, and tooling choices for streaming and batch. Candidates should be able to design hybrid architectures that combine streaming for low latency needs with batch pipelines for large backfills or heavy aggregations and explain operational trade offs such as monitoring, scaling, failure recovery, and debugging.

HardTechnical

0 practiced

Describe how to build and test a connector that streams rows from a proprietary database into Kafka. Cover checkpointing, handling schema changes, initial snapshot vs incremental capture, and how you would simulate failures to validate connector resilience.

HardTechnical

0 practiced

Explain watermarking and event-time processing. Suppose events can arrive out-of-order by several hours; propose strategies to balance result freshness against correctness and bounded state size. Describe use of late-arrival side outputs, retractions, and how to surface corrections to downstream consumers.

MediumTechnical

0 practiced

Compare streaming frameworks (Apache Flink, Spark Structured Streaming, Kafka Streams) for ingestion workloads. Discuss differences in state management, latency, exactly-once semantics, operational complexity, and use cases best suited for each framework.

EasyTechnical

0 practiced

Define 'at least once', 'at most once', and 'exactly once' delivery semantics in the context of stream ingestion. For each semantic give a concrete example of a failure mode that can lead to duplicate or missing data, and one practical mitigation technique.

MediumSystem Design

0 practiced

Design a hybrid ingestion architecture that uses streaming for low-latency needs (alerts, recent dashboards) and batch for heavy aggregations and historical reprocessing. Define data contracts, where raw and curated data live, how you maintain consistency between streaming and batch aggregates, and how to handle late-arriving data.

Unlock Full Question Bank

Get access to hundreds of Real Time and Batch Ingestion interview questions and detailed answers.

Join thousands of developers preparing for their dream job.