Processing Data

Filter, transform, route, analyze and more

Once you have data flowing through Streamlio Cloud, there are many situations where you will want to process that data as it flows through--for example, to perform filtering, aggregations transformations, dynamic routing, or analytics on it. To do that you can use Pulsar Functions, the lightweight stream processing framework provided by Apache Pulsar.

Pulsar Functions provide an easy way to process data flowing through Pulsar without cumbersome SDKs, unfamiliar paradigms, or complex processing plans. With Pulsar Functions, you can write code in common languages such as Java, Python or Go and deploy that Function to Pulsar, which takes care of running that Function, handling failures, ensuring processing guarantees and more without involvement by the developer. You can learn more about Pulsar Functions in the Pulsar documentation and in the Streamlio blog.

Streamlio Cloud Preview allows you to try out Pulsar Functions. It includes an embedded Python editor for creating and deploying Python functions directly from the web interface, and it also supports deployment of Pulsar Functions written in any supported language including Python, Java and Go via the Pulsar CLI.

In the documentation below, we'll focus on deploying a Python function using the Streamlio Cloud Preview web interface. For information on deploying Apache Pulsar Functions using the Pulsar CLI, please see the Pulsar documentation.

Creating a Function

To get started, click on the "Functions" link in the top navigation bar. Then click on "Create" or "Create new function" to get to the Function Builder that is provided in the web interface.

The Streamlio Cloud Preview Function Builder

On the next screen you'll need to provide detailed information about the function, including:

  • Function Name: the identifier for this function; can only contain lower-case letters, numbers, hyphens or periods

  • Input Topic(s): the Pulsar topic that will be the input to the function; note that Pulsar supports multiple topics as function inputs, just click "Add Topic" after providing each input topic name

  • Output Topic: the default topic to which the function's output will be published

  • Runtime: which of the provided runtime environments (e.g. which Python or Java version) is required for executing the function

  • Function code: how you wish to provide your function code; for Python functions, there is the option of using the inline editor to provide code directly

  • Parallelism: the number of parallel function workers to deploy for processing this function (note that workers running in parallel execute a function for an individual message in a topic)

  • CPU quota: the amount of CPU allocated to the container in which the function worker will run

  • Memory: the amount of memory allocated to the container in which the function worker will run

Function Builder page

Once you've completed this information, click on "Create function" and your function will be created and deployed.

Monitoring a Function

You can see all of the functions that you've created in the "Functions" view.

View of deployed Pulsar Functions

To see more details about the work being done by a running function, click on the function name. The function status dashboard shows information about the function including the following:

  • Rate in and rate out: the rate at which the function is consuming messages from the input topic(s) and publishing messages to the output topic

  • Processed: the number of messages / second that the function is processing

  • Backlog: the number of available messages in the input topic that have not yet been consumed by the function

  • Errors: the error rate for this function

  • Latency: the time from the invocation of the function to completion of the function for processing of individual messages

  • Input topic: information about the amount of data the function has processed from the input topic(s)

  • Output topic: information on the amount of data the function has published to the output topic

  • Instances: the number of workers deployed for executing this function

_____ Copyright 2019 Streamlio, Inc. Apache, Apache BookKeeper, Apache Pulsar and associated open source project names are trademarks of the Apache Software Foundation.