Browse Tag: Data Pipeline

Exactly-Once Processing Using Apache Flink and Pravega Connector

This blog post provides an overview of how Apache Flink and Pravega Connector works under the hood to provide end-to-end exactly-once semantics for streaming data pipelines.

Overview

Pravega [4] is a storage system that exposes Stream as storage primitive for continuous and unbounded data. A Pravega stream is a durable, elastic, append-only, unbounded sequence of bytes providing strong consistency model guaranteeing data durability (once the writes are acknowledged to the client), message ordering (events within the same Routing Key will be delivered to the readers in the same order as it was written) and exactly-once support (duplicate event writes are not allowed).

Pravega was designed to support a new generation of streaming applications which process large amounts of data arriving continuously to derive deep insights. Pravega relies on the stream processing frameworks to process and transform the data, and it provides enough storage primitives that are necessary for a stream processing framework to operate on the data and reason about it.

Apache Flink is a distributed stream processor with intuitive and expressive APIs to implement stateful stream processing applications. By combining the features of Apache Flink and Pravega, it is possible to build a pipeline comprising of multiple Flink applications, that can be chained together to give end-to-end exactly-once guarantees across the chain of applications. Continue Reading