Browse Author: Raúl Gracia

Raúl Gracia is a Principal Engineer at DellEMC and part of the Pravega development team. He holds a M.Sc. in Computer Engineering and Security (2011) and a Ph.D. in Computer Engineering (2015) from Universitat Rovira i Virgili (Tarragona, Spain). During his PhD, Raúl has been intern at IBM Research (Haifa, Israel) and Tel-Aviv University. Raúl is a researcher interested in distributed systems, cloud storage, data analytics and software engineering, with more than 20 papers published (e.g., ACM IMC, USENIX FAST, IEEE ICDE and IEEE TPDS).

When speeding makes sense — Fast, consistent, durable and scalable streaming data with Pravega

Raul Gracia and Flavio Junqueira

Introduction

Streaming systems continuously ingest and process data from a variety of data sources. They build on append-only data structures to enable efficient write and read access, targeting low-latency end-to-end. As more of the data sources in applications are machines, the expected volume of continuously generated data has been growing and is expected to grow further [1][2]. Such growth puts pressure on streaming systems to handle machine-generated workloads not only with low latency, but also with high throughput to accommodate high volumes of data.

Pravega (“good speed” in Sanskrit) is an open-source storage system for streams that we have built from the ground up to ingest data from continuous data sources and meet the stringent requirements of such streaming workloads. It provides the ability to store an unbounded amount of data per stream using tiered storage while being elastic, durable and consistent. Both the write and read paths of Pravega have been designed to provide low latency along with high throughput for event streams in addition to features such as long-term retention and stream scaling. This post is a performance evaluation of Pravega focusing on the ability of reading and writing.

To contrast with different design choices, we additionally show results from other systems: Apache Kafka and Apache Pulsar. Initially qualified as messaging systems, both Pulsar and Kafka make a conscious effort to become more like a storage system; they have recently added features like tiered storage. These systems have made fundamentally different design choices, however, leading to different behavior and performance characteristics that we explore in this post.

Continue Reading

Deploying Pravega on Kubernetes

Pravega is a storage system for data streams that has an innovative design and an attractive set of features to cope with today’s Stream processing requirements (e.g., event ordering, scalability, performance, etc.). The project has plenty of documentation and great blog posts that explain in detail every technical aspect of Pravega. But, if you are now more interested in having your first Pravega cluster up and running in the cloud and leave the technical readings for later, then you are at the right place.

We show you how to deploy your “first Pravega cluster in Kubernetes”. We provide a step-by-step guide to deploy Pravega in both Google Kubernetes Engine (GKE) and Amazon Elastic Kubernetes Service (EKS). Our goal is to keep things as simple as possible, and, at the same time, provide you with valuable insights on the services that form a Pravega cluster and the operators we developed to deploy them. For this reason, while we have a “one-click” Pravega installation ready to use, we commend you to complete this blog post. If you do so, you will have a Pravega cluster ready for serving sample applications and, why not, even your first application working against Pravega. Let’s get started!

Continue Reading