Browse Tag: streaming

Yet Another Cache but for the Streaming World

Traditional cache solutions treat each entry as an immutable blob of data, which poses problems for the append-heavy ingestion workloads that are common in Pravega. Each Event appended to a Stream would either require its own cache entry or need an expensive read-modify-write operation to be included in the Cache. To enable high-performance ingestion of events, big or small, while also providing near-real-time tail reads and high-throughput historical reads, Pravega needs a specialized cache that can natively support the types of workloads that are prevalent in Streaming Storage Systems.

The Streaming Cache, introduced in Pravega with release 0.7, has been designed from the ground up with streaming data in mind and optimizes for appends while organizing the data in a layout that makes eviction and disk spilling easy.

Not all caches are created equal. It is essential to choose a cache that fits the requirements of the system where it will be used, and streaming solutions are no exception to that rule. In this blog post, we describe an innovative way to look at caching that works well with streaming use cases. Continue Reading

Pravega Watermarking Support 

Pravega Watermarking Support 

Tom Kaitchuck and Flavio Junqueira

Motivation 

Stream processing broadly refers to the ability to ingest data from unbounded sources and processing such data as it is ingested. The data can be user-generated, like in social networks or other online application, or machine-generated, like in server telemetry or sensor samples from IoT and Edge applications [1]. 

Stream processing applications typically process data following the order in which the data is produced. Following a total order strictly is often not practically possible for a couple of important reasons: 

  1. The source is not a single element as it might comprise multiple users, servers, or gateways; 
  2. Inherent choices of the application design might cause items to be ingested and processed out of order. 

Consequently, the order in Pravega and similar systems refers to the order in which the data is ingested and determined by some concept like keys connecting elements of the data stream. 

The ability to process data following the order of generation, even if only loosely, is one of the most interesting aspects of stream processing as it enables an application to establish temporal correlations about the different events. For example, an application is capable of asking questions such as how many distinct users signed in during the last hour or how many distinct sensors have reported an anomaly in the past 10 minutes. To implement and answer such queries, the application must be able to produce results for every reporting period, every hour in the first example and every 10 minutes in the second. These reporting periods are often referred to as time windows [2].  Continue Reading