Browse Author: Flavio Junqueira

Pravega Internals

Several of the difficulties with tailing a data stream boil down to the dynamics of the source and of the stream processor. For example, if the source increases its production rate in an unplanned manner, then the ingestion system must be able to accommodate such a change. The same happens in the case a processor downstream experiences issues and struggles to keep up with the rate. To be able to accommodate all such variations, it is critical that a system for storing stream data, like Pravega, is sufficiently flexible.

The flexibility of Pravega comes from breaking stream data into segments: append-only sequences of bytes that are organized both sequentially and in parallel to form streams. Segments enable important features, like parallel reads and writes, auto-scaling, and transactions; they are designed to be inexpensive to create and maintain. We can create new segments for a given stream when it needs more parallelism, when it needs to scale, or when it needs to begin a transaction.

The control plane in Pravega is responsible for all the operations that affect the lifecycle of a stream, e.g., create, delete, and scale. The data plane stores and serves the data of segments. The following figure depicts the high-level Pravega architecture with its core components. Continue Reading

Streams in and out of Pravega

Introduction

Reading and writing is the most basic functionality that Pravega offers. Applications ingest data by writing to one or more Pravega streams and consume data by reading data from one or more streams. To implement applications correctly with Pravega, however, it is crucial that the developer is aware of some additional functionality that complements the core write and read calls. For example, writes can be transactional, and readers are organized into groups.

In this post, we cover some basic concepts and functionality that a developer must be aware when developing an application with Pravega, focusing on reads and writes. We encourage the reader to additionally check the Pravega documentation site under the section “Developing Pravega Applications” for some code snippets and more detail. Continue Reading

I have a stream…

Stream On with Pravega

On stage: Pravega

Stream processing is at the spotlight in the space of data analytics, and the reason is rather evident: there is high value in producing insights over continuously generated data shortly rather than wait for it to accumulate and process in a batch. Low latency from ingestion to result is one of the key advantages that stream processing technology offers to its various application domains. In some scenarios, low latency is not only desirable but strictly necessary to guarantee a correct behavior; it makes little sense to wait hours to process a fire alarm or a gas leak event in the Internet of Things.

A typical application to process stream data has a few core components. It has a source that produces events, messages, data samples, which are typically small in size, from hundreds of bytes to a few kilobytes. The source does not have to be a single element, and it can be multiple sensors, mobile terminals, users or servers. . This data is ultimately fed into a stream processor that crunches it and produces intermediate or final output. The final output can have various forms, often being persisted in some data store for consumption (e.g., queries against a data set). Continue Reading