Dada Pipeline

Data Pipelines

To define the flow of data, you design a pipeline in DataFabric Lab. A pipeline consists of stages that represent the origin and destination of the pipeline, and any additional processing that you want to perform. After you design the pipeline, you click Start and DataFabric goes to work.

DataFabric processes data when it arrives at the origin and waits quietly when not needed. You can view real-time statistics about your data, inspect data as it passes through the pipeline, or take a close look at a snapshot of data.

see documentation

Data in Motion

Data passes through the pipeline in batches. This is how it works:

The origin creates a batch as it reads data from the origin system or as data arrives from the origin system, noting the offset. The offset is the location where the origin stops reading.

The origin sends the batch when the batch is full or when the batch wait time limit elapses. The batch moves through the pipeline from processor to processor until it reaches pipeline destinations.

see documentation

Delivery Guarantee

When you configure a pipeline, you define how you want data to be treated: Do you want to prevent the loss of data or the duplication of data?

The Delivery Guarantee pipeline property offers the following choices:

At least once
Ensures that the pipeline processes all data.

At most once
Ensures that data is not processed more than once.

see documentation

Single and Multithreaded Pipelines

The information above describes a standard single-threaded pipeline - the origin creates a batch and passes it through the pipeline, creating a new batch only after processing the previous batch.

Some origins can generate multiple threads to enable parallel processing in multithreaded pipelines. In a multithreaded pipeline, you configure the origin to create the number of threads or amount of concurrency that you want to use. And Data Collector creates a number of pipeline runners based on the pipeline Max Runners property to perform pipeline processing. Each thread connects to the origin system, creates a batch of data, and passes the batch to an available pipeline runner.

see documentation

DataFabric

Connectors

Data Pipelines

Data in Motion

DataFabric STUDIO

Delivery Guarantee

Single and Multithreaded Pipelines