Ability to handle large data sets

James Gianoutsos commented on Nov 5, 2019

We will need to know about the ability to handle large data sets and the way it is handled when we expand an item. For example, if I have a PivotMatrix table and the data source points to a PostgreSQL database, how do we fetch limited records through a server side pagination? Moreover, I would need to make a request to fetch records from the table(s) in the PostgreSQL database when we expand an item, I don't want to have everything pre-loaded. Is this possible and what are the steps to make it work like that?

David Winterburn commented on Nov 6, 2019

Hi James,

Please try the property partialProcessing with Pivot process and see if it helps reducing loading time:

->pipe(new Pivot(array(
    "dimensions" => ...,
    "aggregates" => ...,
    "partialProcessing" => true
)))

Let us know how many total records are there in your query? Thanks!

James Gianoutsos commented on Nov 6, 2019

Hi David,

There will be indefinite records in our query. Will partialProcessing fetch limited records through a server side pagination using limit? How an API call is made when an user clicks on a page number? Also, would it help to make a API call to fetch records from the table(s) in the PostgreSQL database when we expand an item? If you can show me examples illustrating how partialProcessing would help in getting the above, it would help.

KoolReport commented on Nov 7, 2019

The Pivot table is different from normal table in which we can fetch some number of rows for normal table to do the pagination, however pivot table needs to do calculation on the whole table so all rows in table needs to be involved in the process either in web server or in database server.

The Pivot package currently do calculation in php so all rows will be fetched, however if table does not expand, the calculation will be faster so it is called partialProcess.

Another option ( which will be implemented in Pivot later) is to shift the calculation to database, using sql query to fetch calculated data. This way is promising to work with larger amount of data. In fact, we have done this before in one of our product called KoolPivotTable. So if your data is really large and want to try this database calculation, you can try KoolPivotTable.

James Gianoutsos commented on Nov 7, 2019

I looked at the KoolPivotTable. It works in ajax, but when I reviewed the Example.php, it appears that it is fetching all products at once from the database, the ajax only renders a chunk of the array/result set when you filter or click on a page. Here's the data-source in your example; $ds ->select("customerName, productName, productLine, dollar_sales")->from("customer_product_dollarsales"); $pivot->DataSource = $ds;. As you would see, there's no limit in the query. Can you post an example which doesn't require fetching all the products at one time? Please show us in the example how the Ajax call will be able to take parameters and carries the limit/offset.

KoolReport commented on Nov 7, 2019

@James: It does not. Although we allows user to just enter the broad query like above. We will manipulate that query by adding filtering, grouping, ordering ... depend on action of user on the widget like collapsing or expanding each segment of data.

David Winterburn commented on Nov 8, 2019

Hi James,

Would you mind letting us know how large your database table is?

Don't worry about the select query of KoolPivotTable though. We use it not to fetch all data from database but to build complex queries which only return a very small result set from the database. KoolPivotTable is built to work with large databases with million of rows so I think it could work for your case. Please try it and we will help you to get it work fine.

James Gianoutsos commented on Nov 12, 2019

I looked at the KoolPivotTable and I now understand how KoolPivotTable will add condition, grouping, limit, offset to make new query to get a sub-set data only. The first thing is, I'm using PostgreSQL. I downloaded the code and noticed that you are using mysqli. Do we need to replace mysqli with pg_pconnect / pg_query etc. everywhere? I'm asking this question because I don't see a setting to choose my database driver.

David Winterburn commented on Nov 13, 2019

Hi,

To use KoolPivotTable with PostgreSQL you have 2 methods:

Create a postgresql connection and use PostgreSQLPivotDataSource class as datasource.
Create a pdo connection and use PdoPivotDataSource as datasource.

Both PostgreSQLPivotDataSource and PdoPivotDataSource are in KoolPivotTable. It also has Oracle, SQL Server, Firebird datasource classes in case you need them. Let us know if you have any question. Thanks!

James Gianoutsos commented on Nov 13, 2019

Thanks David, I will try using the PostgreSQLPivotDataSource / PdoPivotDataSource data-source. For the time being, I was trying out the KoolPivotTable with MySQLiPivotDataSource. I tried to pull data from the users table, there are 372,662 records.

Here's my code:

$ds ->select("users.id,users.first_name, users.last_name, users.phone_number") ->from("users");

$field = new PivotField("id"); $field->Text = "Customer"; $pivot->AddRowField($field);

Here's the output. Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 20480 bytes) in D:\xampp\htdocs\KoolPHPSuite\KoolControls\KoolPivotTable\koolpivottable.php on line 1

KoolReport commented on Nov 13, 2019

Hi James,

Pivot is a data summarization tool so you should select those data that can be summarized. So your field that you use is "id" is not a field which can be summarized, because each user has his own ids. So when pivot group your data by id, you got exactly 372,662 rows returned to php. That why you got over memory overflow issue.

Phone number is another field which should not be summarized because each users has their own phone number. You should add those fields like users.country, users.city for example.

If you have experienced the pivot table in Microsoft Excel, our KoolPivotTable works the same, you add those grouping fields to rows and columns and add the measurement field to data field. Then pivot will group your data by those fields in rows and columns and calculate the field in data field.

Hope my answer helps. Let me know if you need more clarification.

James Gianoutsos commented on Nov 13, 2019

Yep, that clarifies. My concern was different. I tried to point out that when we have 372,662 rows, we are getting a memory overflow issue. Which means that, we cannot have a summarized data which returns so many rows.

KoolReport commented on Nov 13, 2019

Yes, you are correct. So many rows returned after even grouping will cause that issue. Either you should choose those grouped fields wisely or you need to allocate more memory. But I believe in term of business, if a field is so fragmented, you should not grouped by that field because (putting myself into a user's foot), seeing a category list with hundred thousands may be too much and over user's interest.

KoolReport's Forum

Ability to handle large data sets #1145

Build Your Excellent Data Report