KoolReport's Forum

Official Support Area, Q&As, Discussions, Suggestions and Bug reports.
Forum's Guidelines

Select data from csv datastore #689

Closed Eugene opened this topic on on Feb 19, 2019 - 24 comments

Eugene commented on Feb 19, 2019

Hi, I have csv file like this

code;       datetime;         flag
20001;	2017-04-25 15:17:55;	0
10001;	2017-04-25 15:29:30;	0	
20006;	2017-04-25 15:29:53;	0
...	

How can I get the data corresponding to the following select:

SELECT code, MAX(datetime)
FROM att_test
WHERE code=1000
  AND DATE (datetime) BETWEEN "2019-02-01" AND "2019-02-28"
  AND TIME (datetime) > "7:00:00"
  AND flag=0
GROUP BY  DATE(datetime);
KoolReport commented on Feb 19, 2019

Some things like this:

$this-src("csvdata")
->pipe(new CopyColumn(array(
    "date"=>"datetime",
    "time"=>"datetime",
)))
->pipe(new DateTimeFormat(array(
    "date"=>"Y-m-d",
    "time"=>"H:i:s"
)))
->pipe(new Filter(array(
    array("code","=",1000),
    array("date",">","2019-02-01"),
    array("date","<","2019-02-28"),
    array("time",">","07:00:00"),
    array("flag","=",0)
)))
->pipe(new Group(array(
    "by"=>"date",
    "max"=>"datetime"
)))
...
Eugene commented on Feb 19, 2019

Oh.. looks simple... I don't know why but I tried to use where()...

Thanks a lot

PS Please make more examples in the docs section - something like the recipe book it should be very helpful

Eugene commented on Feb 19, 2019

Dear Koolreport,

it works great with AND but how to realize OR?

KoolReport commented on Feb 19, 2019

You do this:

->pipe(new Filter(array(
    array("value","<",10),
    'or',
    array("value",">",30),
)))

After the operator or, all the condition after it will use or, if you want and again, you need to use 'and'

Eugene commented on Feb 19, 2019

:-) thanks...

I wonder if there will ever be a question to which you will not have an answer. :-))

Eugene commented on Feb 19, 2019

But what if the logic is a bit more complicated

could i do something like this (a AND b) OR (c AND d)

KoolReport commented on Feb 19, 2019

The Filter is built for convenience and solve like 80% of use cases, for the rest of complicated conditions and even super complicated, you should use the Custom:

->pipe(new Custom(function($row){
    if( (($row["a"] && $row["b"]) || ($row["c"] && $row["d"]) )
    {
        return $row;
    }
}))

So basically we return the rows that match condition, those rows which do not match condition will not be returned and will be removed from the data stream.

Eugene commented on Feb 19, 2019

Thank you!

Eugene commented on Feb 20, 2019

Hi, it looks like "max"=>"datetime" does not work for my strings... any solutions?

KoolReport commented on Feb 20, 2019

Is your date in this format "2007-05-12 07:12:08", it should work if your time is in this way as we can compare the date with "<" operator. So currently what do you get?

Eugene commented on Feb 21, 2019

This is my data

10019	2019-02-20 06:56:46	0	1
10019	2019-02-20 06:56:47	0	1
10019	2019-02-20 15:02:31	0	1
10019	2019-02-20 15:02:50	1	1
10019	2019-02-20 15:02:47	0	1
10019	2019-02-20 15:02:39	1	1
10019	2019-02-20 15:02:45	0	1
10019	2019-02-20 15:02:40	1	1
10019	2019-02-20 15:02:51	1	1

This is what i do

->pipe(new Group(array(
            "by" => array("attcode", "date","in_out"),
            "max" => "datetime",

This is what i get

datetime           it_out
2019-02-20 06:56:46	0
2019-02-20 15:02:50      1

It looks like they are the first rows in each group

I decided that "max" => "datetime" does not work because without it I get the same

KoolReport commented on Feb 21, 2019

It is because you group by "it_out" as well. So basically those above result contains result in different group.

Eugene commented on Feb 21, 2019

Could you explain why in_out grouping has this effect? The group with in_out=0 has 5 rows. How to get max value?

Eugene commented on Feb 21, 2019

I tried with

"by" => array("date"),
            "max" => "datetime",

the result is the same - the first row

2019-02-20 06:56:46	0

So i think must be another reason and not the in_out groupping

KoolReport commented on Feb 21, 2019

Let say you have sale data with columns "country" "city" "amount"

if you goup by "country" and "city" then the combination of "country" and "city" will define a group.

So any rows with same country, same city will be in a group. If you calculate the "max" for this group, it will find the max amount within those rows has the same country and city.

You may try to try to upload data to mysql, and try the select query with similar grouping and aggregate, you will find the similar.

In above case, when you group by "date" only and calculate the max "datetime", it will return the max datetime within a day. The "it_out" will not be grouped or calculated so it will have the value of first row (in the group)

Eugene commented on Feb 21, 2019
  1. if i group by date only i still get the first row
  2. I don't understand why if you group by "country" and "city" and calculate the "max" for this group, it will find the max amount within those rows has the same country and city. But if i group by date and in_out in my case i get the first row...

NB it_out in my example is in_out - it is a typo...

KoolReport commented on Feb 21, 2019

In your last example, you group by "date" only, not "in_out" so I explain that "in_out" will get value of the first row within a group ( defined by the "date")

KoolReport commented on Feb 21, 2019

Also, although the result could look the same but it is different ( when you group by "in_out" vs not grouping with it). It is coincident and base on nature of your data.

Eugene commented on Feb 21, 2019

Dear Koolreport, Sorry, but it is a bit difficult for me to understand you. What I need is to get max value. I followed your recommendations. Let's forget about groupping by in_out column. The result is wrong even i group by date only.

Ok. You asked me to do the same in mysql. The result is below:

Data

Query:

SELECT code, MAX(datetime),in_out
FROM att_test
WHERE
  DATE (datetime) BETWEEN "2019-02-20" AND "2019-02-20"
GROUP BY  date;

The result is correct:

Another Query:

SELECT code, MAX(datetime),in_out
FROM att_test
WHERE
  DATE (datetime) BETWEEN "2019-02-20" AND "2019-02-20"
GROUP BY  code, date, in_out;

The result is also correct.

So we return to my first question how to get the same if the data source is CSV file?

KoolReport commented on Feb 21, 2019

Please send us your csv to our email.

Eugene commented on Feb 21, 2019

done

KoolReport commented on Feb 21, 2019

I will answer you in next few hours since I got customer training section here.

KoolReport commented on Feb 23, 2019

Hi Engene,

Our developer has done a test and I can assure that our Group is working very well. Here is the test to compare CSV and DB. Your data is like above.

Report.php

<?php

class Report extends \koolreport\KoolReport
{
    use \koolreport\clients\Bootstrap;
    protected function settings()
    {
        return array(
            "dataSources"=>array(
                "db"=>array(
                    "connectionString"=>"mysql:host=localhost;dbname=eugene",
                    "username"=>"root",
                    "password"=>"",
                    "charset"=>"utf8"
                ),
                "csv"=>array(
                    "class"=>'\koolreport\datasources\CSVDataSource',
                    "filePath"=>dirname(__FILE__)."/data.csv"
                )
            )
        );
    }
}

Report.view.php

<?php
use \koolreport\widgets\koolphp\Table;

use \koolreport\processes\CalculatedColumn;
use \koolreport\processes\Group;
use \koolreport\processes\Filter;
use \koolreport\processes\CopyColumn;
use \koolreport\processes\DateTimeFormat;

?>
<html>
<head>
    <title>Test Grouping</title>
</head>
<body>
    <div class="container">
        <h1 class="text-center">CSV Group By Date</h1>
        <?php
        Table::create(array(
            "dataSource"=>(
                $this->src("csv")
                ->pipe(new CopyColumn(array(
                    "date"=>"datetime"
                )))
                ->pipe(new DateTimeFormat(array(
                    "date"=>"Y-m-d"
                )))
                ->pipe(new Filter(array(
                    array("date","<=","2019-02-20"),
                    array("date",">=","2019-02-20")
                )))
                ->pipe(new Group(array(
                    "by"=>"date",
                    "max"=>"datetime"
                )))
            ),
            "columns"=>array("code","datetime","in_out")
        ));
        ?>
        <h1 class="text-center">DB Group By Date</h1>
        <?php
        Table::create(array(
            "dataSource"=>(
                $this->src("db")->query("
                    SELECT code, MAX(datetime),in_out
                    FROM att_test
                    WHERE
                    DATE(datetime) BETWEEN '2019-02-20' AND '2019-02-20'
                    GROUP BY DATE(datetime);
                ")
            )
        ));
        ?>
        <br/>
        <br/>
        <br/>
        <br/>
        <br/>
        <h1 class="text-center">CSV Group By Code, Date, In_out</h1>
        <?php
        Table::create(array(
            "dataSource"=>(
                $this->src("csv")
                ->pipe(new CopyColumn(array(
                    "date"=>"datetime"
                )))
                ->pipe(new DateTimeFormat(array(
                    "date"=>"Y-m-d"
                )))
                ->pipe(new Filter(array(
                    array("date","<=","2019-02-20"),
                    array("date",">=","2019-02-20")
                )))
                ->pipe(new Group(array(
                    "by"=>array("code","date","in_out"),
                    "max"=>"datetime"
                )))
            ),
            "columns"=>array("code","datetime","in_out")
        ));
        ?>
        <h1 class="text-center">DB Group By Code, Date, In_out</h1>
        <?php
        Table::create(array(
            "dataSource"=>(
                $this->src("db")->query("
                    SELECT code, MAX(datetime),in_out
                    FROM att_test
                    WHERE
                    DATE(datetime) BETWEEN '2019-02-20' AND '2019-02-20'
                    GROUP BY code, DATE(datetime), in_out;
                ")
            )
        ));
        ?>    
    </div>
</body>
</html>

The result coming from both case are the same.

When group by date only, we have this output:

From CSV

codedatetimein_out
100192019-02-20 15:02:510

From DB

codeMAX(datetime)in_out
10,0192019-02-20 15:02:510

__When group by code, date, in_out we have this output:__

From CSV

codedatetimein_out
100192019-02-20 15:02:470
100192019-02-20 15:02:511

From DB

codeMAX(datetime)in_out
10,0192019-02-20 15:02:470
10,0192019-02-20 15:02:511

You can test yourself with above code. If something could go wrong, I think it could be the definition of BETWEEN in SQL. When we translate to Filter process, it should <= and >=. It may go wrong if we use < and > operator.

Eugene commented on Feb 23, 2019

Thank you! It was my fault I used 'datetime' but created 'dateTime' some lines before... Stupid mistake

Build Your Excellent Data Report

Let KoolReport help you to make great reports. It's free & open-source released under MIT license.

Download KoolReport View demo
solved

None