Change Data Capture with Pravega + Debezium

By Derek Moore on Posted on August 16, 2022 in Best Practices Cloud Technologies Stream Processing

Change Data Capture (CDC) is becoming a popular technique for interconnecting disparate systems, for replicating state across traditional boundaries, for decomposing existing monoliths into microservices, and for the recordation of audit trails. CDC is the idea of emitting a changelog of all INSERT‘s, UPDATE‘s, DELETE‘s, and schema changes performed on a database. Debezium.io is an […]

By Yumin Zhou on Posted on November 1, 2021 in Cloud Analytics Real-time Analytics Stream Processing

Introduction Pravega is a storage system based on the stream abstraction, providing the ability to process tail data (low-latency streaming) and historical data (catchup and batch reads). Relatedly, Apache Flink is a widely-used real-time computing engine that provides unified batch and stream processing. Flink provides high-throughput, low-latency streaming data processing, as well as support for complex event […]

Data Flow from Sensors to the Edge and the Cloud using Pravega

By Claudio Fahey on Posted on March 23, 2021 in Stream Processing Use Cases

Introduction Today there are billions of sensors around the world, producing a massive amount of data. Some sensor data will be used only at the edge, and some will be sent to the cloud or data centers for aggregation, analytics, and AI efforts. These sensors may measure or produce images, video, lidar, audio, acceleration, GPS, […]

By Tom Kaitchuck on Posted on November 8, 2019 in News/Updates Releases Stream Processing Watermarking

Pravega Watermarking Support Tom Kaitchuck and Flavio Junqueira Motivation Stream processing broadly refers to the ability to ingest data from unbounded sources and processing such data as it is ingested. The data can be user-generated, like in social networks or other online application, or machine-generated, like in server telemetry or sensor samples from IoT and […]

By Vijay srinivasaraghavan on Posted on June 10, 2019 in Best Practices Real-time Analytics Stream Processing

This blog post provides an overview of how Apache Flink and Pravega Connector works under the hood to provide end-to-end exactly-once semantics for streaming data pipelines. Overview Pravega [4] is a storage system that exposes Stream as storage primitive for continuous and unbounded data. A Pravega stream is a durable, elastic, append-only, unbounded sequence of […]

By Andrei Paduroiu on Posted on March 7, 2019 in Storage Stream Processing

The Pravega Segment Store Service is a subsystem that lies at the heart of the entire Pravega deployment. It is the main access point for managing Stream Segments, providing the ability to modify and read their contents. The Pravega Client communicates with the Pravega Stream Controller to identify which Segments need to be used (for […]

By Flavio Junqueira on Posted on October 17, 2018 in Storage Stream Processing

Several of the difficulties with tailing a data stream boil down to the dynamics of the source and of the stream processor. For example, if the source increases its production rate in an unplanned manner, then the ingestion system must be able to accommodate such a change. The same happens in the case a processor […]

Change Data Capture with Pravega + Debezium

Pravega Flink Connector 101

Data Flow from Sensors to the Edge and the Cloud using Pravega

Pravega Watermarking Support

Exactly-Once Processing Using Apache Flink and Pravega Connector

Segment Store Internals

Pravega Internals