WarpStream - A Kafka Compatible Data Streaming Platform Acquired by Confluent

| 10 min read
Author: masahiro-kondo masahiro-kondoの画像
Information

To reach a broader audience, this article has been translated from Japanese.
You can find the original version here.

Introduction

#

WarpStream is an Apache Kafka compatible data streaming platform.

It is composed solely of object storage like S3 and a single binary agent, featuring no networking costs between availability zones (AZ), no need for disk management, and high scalability.

WarpStream - An Apache Kafka Compatible Data Streaming Platform

This September, it was announced that WarpStream was acquired by Confluent, which provides cloud services for Kafka.

Confluent acquires WarpStream | Confluent

The co-founder of WarpStream wrote about the acquisition process in their blog.

WarpStream is Dead, Long Live WarpStream

The official documentation for WarpStream is available here.

Introduction | WarpStream

It seems it's not production-ready yet, but I would like to explore it a bit and look into its architecture.

Information

Redpanda, which I introduced in a previous article, is also a Kafka compatible streaming platform. It features simple deployment with a single binary, high speed, and fault tolerance.

Kafka Compatible High-Efficiency Data Streaming Platform Redpanda

Preparation for Testing: Installing WarpStream Agent / CLI

#

To run demos or the Playground for WarpStream, you need to install the WarpStream Agent / CLI.

Binaries are provided for Linux and macOS on amd64/arm64. You can also try it with Docker. Since it's provided as a single binary, you just need to download the one suitable for your platform and add it to your path.

Install the WarpStream Agent / CLI | WarpStream

Below is an example of installing the binary for an Apple Silicon Mac.

curl -LO https://warpstream-public-us-east-1.s3.amazonaws.com/warpstream_agent_releases/warpstream_agent_darwin_arm64_latest.tar.gz
tar xfz warpstream_agent_darwin_arm64_latest.tar.gz
sudo mv warpstream_agent_darwin_arm64 /usr/local/bin/warpstream

Trying Out the Playground

#

Let's try running the Playground using the CLI.

Run the Agents Locally | WarpStream

warpstream playground

Sign up for a temporary account for the Playground, and the agent will start locally as follows.

WARNING, RUNNING IN PLAYGROUND MODE. All data and state is ephemeral. Server will shutdown automatically after: 4h0m0s

Signing up for temporary account...
Done signing up for temporary account
Starting local agent...

started agent, HTTP server on port: 8080 and kafka server on port: 9092

open the developer console at: https:/console.warpstream.com/login?warpstream_redirect_to=virtual_clusters%2Fvci_4ef27467_0885_4e6c_991c_a95ebba854a4%2Foverview&warpstream_session_key=sks_908e60e40aa2ed27ca59a46289e005144377785ca2b8ea111dad65459d72825e


Keep in mind that playground accounts are heavily ratelimited

In this state, test the connection from another terminal using the kcmd subcommand.

warpstream kcmd -type diagnose-connection -bootstrap-host localhost -bootstrap-port 9092
running diagnose-connection sub-command with bootstrap-host: localhost and bootstrap-port: 9092


Broker Details
---------------
  LOcALHOST:9092 (NodeID: 597523006) [playground]
    ACCESSIBLE ✅


GroupCoordinator: LOcALHOST:9092 (NodeID: 597523006)

The connection to the broker has been confirmed.

You can see the state of clusters and topics by accessing the developer console URL displayed at startup.

Console

The agent is running locally, but the WarpStream cluster's Control Plane is deployed on AWS.

Clusters

Creating Topics and Communication

#

Let's try creating topics and communicating while the Playground is running.

"Hello World" for Apache Kafka | WarpStream

Create a topic by specifying create-topic with the warpstream subcommand kcmd.

warpstream kcmd --type create-topic --topic hello-warpstream
running create-topic sub-command with bootstrap-host: localhost and bootstrap-port: 9092
created topic "hello-warpstream" successfully, topic ID: MgAAAAAAAAAAAAAAAAAAAA==

The topic has been created. It is also displayed in the console.

Topic

Let's try sending a message to the topic. Specify produce with kcmd.

warpstream kcmd --type produce --topic hello-warpstream --records "world,,world"
running produce sub-command with bootstrap-host: localhost and bootstrap-port: 9092

result: partition:0 offset:0 value:"world" 
result: partition:0 offset:1 value:"world" 

It seems the message has been sent. For receiving, specify fetch with kcmd.

warpstream kcmd --type fetch --topic hello-warpstream --offset 0
running fetch sub-command with bootstrap-host: localhost and bootstrap-port: 9092

consuming topic:"hello-warpstream" partition:0 offset:0
result: partition:0 offset:0 key:"hello" value:"world"
result: partition:0 offset:1 key:"hello" value:"world"

The message was received. It seems that the default key for messages in kcmd produce is "hello".

Let's also try receiving with a Kafka consumer.

kafka-console-consumer --bootstrap-server localhost:9092 --topic hello-warpstream --property print.key=true --property key.separator="-" --from-beginning

In kafka-console-consumer, the message key is hidden by default, so I specified it with property to connect with -.

hello-world
hello-world

It was also received here.

Running the Demo Environment

#

Let's stop the Playground and try running the demo. In the demo environment, topics are automatically created, and WASM versions of Consumer / Producer are automatically started to send and receive messages periodically, allowing you to see traffic on the Console screen.

Run the Demo | WarpStream

Start the demo.

warpstream demo

Like the Playground, the demo environment starts while logged into a temporary account. It will automatically shut down after one hour.

Demo will automatically shutdown after: 1h0m0s

Signing up for temporary account...
Done signing up for temporary account
Created temporary data directory: /var/folders/w9/4fztxrrj6lq3tstsk4qt6xlh0000gn/T/warpstream_demo2619599535
Starting local agent...

started agent, HTTP server on port: 8080 and kafka server on port: 9092
Creating local kafka client...
Done creating local kafka client
Creating demo topic
Done creating demo stream
Starting demo producers and consumers

opening developer console in browser: https:/console.warpstream.com/login?warpstream_redirect_to=virtual_clusters%2Fvci_edbd2502_fabd_47b6_b671_3d8673b5430f%2Foverview&warpstream_session_key=sks_71fe21c5081d1da10680c09029d826a2d138a4ac7ab6f832a556ddf8ecd7cfed

opening data directory in browser: /var/folders/w9/4fztxrrj6lq3tstsk4qt6xlh0000gn/T/warpstream_demo2619599535

run this command in a separate terminal to see the agents form a cluster together:

WARPSTREAM_AVAILABILITY_ZONE=DEMO WARPSTREAM_LOG_LEVEL=warn warpstream agent \
       :

The demo topics, producers, and consumers start, and periodic sending and receiving of demo messages begin.

The console screen automatically opens in the browser. The storage folder for the demo environment also opens. Since demo messages are automatically sent and received, the Overview displays the throughput status.

Console

folder

Only one agent for the demo is currently running. The number of CPUs matches the number of CPU cores (12) on the host machine.

Single Agent

Let's add an agent according to the startup message.

WARPSTREAM_AVAILABILITY_ZONE=DEMO WARPSTREAM_LOG_LEVEL=warn warpstream agent \
				--defaultVirtualClusterID=vci_edbd2502_fabd_47b6_b671_3d8673b5430f \
				--agentKey=aks_axxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx \
				--bucketURL=file:///var/folders/w9/4fztxrrj6lq3tstsk4qt6xlh0000gn/T/warpstream_demo2619599535 \
				--metadataURL=https://metadata.playground.us-east-1.warpstream.com \
				--httpPort=8081 \
				--kafkaPort=9093 \
				--enableClusterWideEnvironment \
				--clusterWideEnvironmentPort=9081 \
				--gracefulShutdownDuration=0s

Dual Agents

A second agent has been added. Both agents show usage, indicating that processing is distributed within the cluster.

WarpStream Architecture

#

So far, we've run the Playground and Demo, but how does WarpStream operate? Let's refer to the architecture section of the official documentation.

Architecture | WarpStream

Nodes that make up an Apache Kafka cluster are paired with storage and managed as stateful workloads. In contrast, WarpStream requires no local disk and consists only of stateless agents communicating with object storage, with all cluster state management offloaded to the Control Plane of WarpStream Cloud. The architecture diagram is quoted from the official documentation.

Architecture

User data in WarpStream is sent and received only within the user's VPC, with only metadata for cluster management sent to WarpStream Cloud.

WarpStream separates storage and computing, which are integrated in Kafka nodes, separates data and metadata, and separates the Control Plane as a cloud service, making user agents stateless. This makes auto-scaling easier.

Conclusion

#

There was an article stating that Kafka has reached a turning point, facing various architectural challenges as its use cases increase[1].

Kafka Has Reached a Turning Point

WarpStream is said to reduce the execution cost of Kafka to 1/10. The Pricing page states it is 80% cheaper than Kafka.

Pricing - WarpStream - Stream More, Manage Less

It is possible to configure not only with S3 but also with S3 compatible storage like MinIO or R2.

We can see a future where Kafka cluster management is unnecessary, and a streaming environment that is cheap and easy to scale is available.


  1. The author learned about WarpStream from this article. ↩︎

豆蔵では共に高め合う仲間を募集しています!

recruit

具体的な採用情報はこちらからご覧いただけます。