Data Processing

Overview #

Data coming from the datasource is piped through various processing nodes until it reaches the data store as final destination.

In the previous section, we learned how to set settings for datasources to connect and pull data from your database or another sources. The data will be then streaming through many processing nodes. Each processing node does a specific task upon the data stream. Together they will help to transform original data into meaningful information/data to be consumed in the report view.

Sample Code #

<?php
use \koolreport\processes\Filter;
use \koolreport\processes\Group;
class MyReport extends \koolreport\KoolReport
{
    ...
    public function setup()
    {
        $this->src('sales')
        ->query("SELECT customerName, productName,region, amount from tblPurchases")
        ->pipe(new Filter(array(
            array("region","=","asia")
        )))
        ->pipe(new Group(array(
            "by"=>"customerName",
            "sum"=>"amount"
        )))
        ->pipe($this->dataStore('sales'));
    }
}

In above example, we want to summarize total purchased amount per customer from Asia region. The data stream from SQL Query will filtered by region then grouped by customerName. In the end, the result of processing will be store in dataStore named "sales".

Piping #

pipe() #

From above example, we have seen the pipe() method in action. pipe() method helps to chain list of processes for data to flow through. This method is the most common used in KoolReport.

pipeIf() #

->pipeIf($condition, $trueFunc, $falseFunc = null);

This pipeIf() method will take a condition at beginning. If the condition is true the the anonymous $trueFunc will be called. If the condition is false then $falseFunc will be called if available.

This function is added because we need different process on certain condition.

Example:

    ...
    ->pipeIf(
        $filterAsia,
        function($node){
            //Filter data by asia region
            return $node->pipe(new Filter(array(
                array("region","=","asia")
            )));
        },
        function($node)
        {
            // Do nothing
            return $node;
        }
    )
    ->pipe(new Group(array(
        "by"=>"customerName",
        "sum"=>"amount"
    )))
    ->pipe($this->dataStore('sales'));

In above example, if the $filterAsia is true then the Filter process will be added to stream. In $falseFunc we do nothing (we can omit this function actually) and we can add other processes if we need.

pipeTree() #

->pipeTree($branchFunc1, $branchFunc2, ...)

There is another common use case that data from a node needs to be sent to several routes to different datastores. In the old way, we will do like this:

$this->src("data")
->pipe(new ColumnMeta(..))
->saveTo($root);

$root->pipe(new Filter(array()))
->pipe($this->dataStore("store1"));

$root
->pipe(new Filter(...))
->pipe($this->dataStore("store2"));
...

And this is the new way to produce the same result but in clearer manner

$this->src("data")
->pipe(new ColumnMeta(..))
->pipeTree(
    function ($node)
    {
        $node->pipe(new Filter(...))
        ->pipe($this->dataStore("store1"));
    },
    function ($node)
    {
        $node->pipe(new Filter(...))
        ->pipe($this->dataStore("store2"));
    }   
);