KoolReport's Forum

Official Support Area, Q&As, Discussions, Suggestions and Bug reports.
Forum's Guidelines

How to integrate a PHP array for-loop for a long SQL query over 50 databases? #743

Open Markus Mohr opened this topic on on Mar 14, 2019 - 9 comments

Markus Mohr commented on Mar 14, 2019

KoolReport 3.25.4.

I have a very complex SQL query over many fields (about 1.000 lines of code) with several calculations, using CONCAT_WS, COALESCE, several JOINs and that all brought into UNION with the same query for all in all 50 databases. If I did that sequentially, it would amount to 50.000 lines of code.

I can, however, create and nice for loop in the form of ...

for ($db = 1;$db <= 50;$db++) {
     $sql_query = 'SELECT * FROM database' . str_pad($db,2,'0',STR_PAD_LEFT) . '.table WHERE blah';
     $rQuery = mysql_query($qQuery, $oDatabase);
     $result[] = array("count" => mysql_num_rows($rQuery ), "result" => $rQuery);
}
return $result;

... to reiterate this process until all database queries are carried out making use of this SQL query codes of 1.000 lines only once.

How can I integrate this code into the KoolReport framework?

Andrew Borell commented on Mar 15, 2019

Depends on whether your databases are in the same instance or different instances in my humble opinion. I think merging a multidimensional array from 50 different databases sounds like a nightmare. If its 50 databases in a single instance then I wouldnt use koolreport at all to join all your result sets. Instead I would leverage mysql to deal with that heavy lifting then use koolreport for the presentation layer.

DROP PROCEDURE IF EXISTS `report_db`.`getDbData`;
CREATE PROCEDURE `report_db`.`getDbData`(
	# no parameters in this example 
	)
READS SQL DATA
DETERMINISTIC
BEGIN
DECLARE db_name VARCHAR(255) default '';
DECLARE no_more_rows BOOLEAN DEFAULT 0;
DECLARE num_rows INT DEFAULT 0;
DECLARE x_cur CURSOR FOR 
	SELECT DISTINCT SCHEMA_NAME AS `database`
        FROM information_schema.SCHEMATA
	WHERE  SCHEMA_NAME NOT IN ('information_schema', 'performance_schema', 'mysql','sakila')
	ORDER BY SCHEMA_NAME;
DECLARE CONTINUE HANDLER FOR NOT FOUND
    SET no_more_rows = TRUE;
DROP TEMPORARY TABLE IF EXISTS temp_table_xyz;
CREATE TEMPORARY TABLE temp_table_xyz (
  your varchar(255) not null,
  fields double not null
);
OPEN x_cur;
select FOUND_ROWS() into num_rows; 
  db_name_loop: LOOP
    FETCH  x_cur INTO db_name ;
    IF no_more_rows THEN
        CLOSE x_cur;
        LEAVE db_name_loop;
    END IF;    
    SET @s = CONCAT('insert into temp_table_xyz (your, fields) SELECT your, fields FROM `' , db_name,'`.`table_name`');                                                                               
      PREPARE stmt FROM @s;                                                                                                                                               
      EXECUTE stmt; 
  END LOOP db_name_loop;
  select * from temp_table_xyz;
END;

Markus Mohr commented on Mar 15, 2019

Thank you, Andrew,

all MySQL databases are set up in the same manner with the same structure within one single high-performance database server. The data volume is relatively small and text-based, so it does not contain large binary objects or the like.

I see that you are creating a procedure as the first way to handle information which is then put into the KoolReport framework. I will examine whether this is possible for the SQL query.

Andrew Borell commented on Mar 15, 2019

Your other option would be to create an array of your 50 database names, iterate in the setup to spin off 50 queries and create 50 datastores, and finally run a loop in php to merge the arrays into a consolidated datastore. I have coded this situation too, but only because my databases were in different rdbms on different servers across the world. I later found it was faster to create a linked server in mssql to the remote server running mysql and use openrowset to gather the remote data than it was to consolidate the datasets in php. So i guess in both cases i ended up using sql to resolve the issue, but it is possible to do it in php.

Markus Mohr commented on Mar 15, 2019

Hello, Andrew,

I have only one major MySQL database server at the hoster where the data are saved in the 50 databases; there is no other option for getting another database server, especially no MSSQL one since this is a Linux exclusive hosting.

But maybe you could show me your outline how you had done this task making use of the procedure you described before using the "linked server in mssql". I would appreciate very much to see the code, understand and learn from it.

Andrew Borell commented on Mar 15, 2019

I should take a step back and advise you do not use a linked server if possible and instead use openrowset in MSSQL for accessing a remote MySQL server because a linked server will gather all the results and filter them locally in tempdb, whereas with openrowset you can filter server side using the remote server rdbms syntax. Also in my case it was only two servers and one database on each, so I did not require a cursor. That aside, here is what I think you are asking me:

  1. on the mssql server, install visual c++2010 runtimes if they are not already installed.
  2. install mysql odbc connector drivers ( version 5.1 64-bit in this example ) that match your sql server architecture MySQL ODBC 5.1 64-bit Enable ad-hoc distributed queries in the mssql instance
sp_configure 'show advanced options', 1;  
RECONFIGURE;
GO 
sp_configure 'Ad Hoc Distributed Queries', 1;  
RECONFIGURE;  
GO  

Then basically the same thing in your procedure.


USE [whatever_database_name_here]
GO

/****** Object:  StoredProcedure [dbo].[getRemoteData]    Script Date: 03/15/2019 16:54:05 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author:		<Author,,Name>
-- Create date: <Create Date,,>
-- Description:	<Description,,>
-- =============================================
CREATE PROCEDURE [dbo].[getRemoteData] 
	-- default bounds for epoch time ( mysql timestamp datatype )
	@pdate1 date = '1970-01-01',
	@pdate2 date = '2038-01-19'
AS
BEGIN
	SET ARITHABORT ON;
	SET ANSI_WARNINGS ON;
	SET NOCOUNT ON;

	DECLARE @dbname VARCHAR(255) = '';
	DECLARE @qry VARCHAR(8000) = '';
	DECLARE @cur_x CURSOR;

	IF OBJECT_ID('tempdb..#tmp0') IS NOT NULL DROP TABLE #tmp0	
	CREATE table #tmp0 (
		dbname varchar(255) not null, 
		your varchar(255) not null, 
		fields varchar(255) 
		); 
			
	SET @cur_x = CURSOR FAST_FORWARD FORWARD_ONLY FOR 
		SELECT q.dbname 
			FROM OPENROWSET('MSDASQL'
				,'Driver={MySQL ODBC 5.1 Driver};Server=172.16.0.22;port=3306;Database=any_database_here;User=your_user;Password=your_password;Option=3;'
				,'SELECT DISTINCT SCHEMA_NAME AS `dbname`
				FROM information_schema.SCHEMATA
				WHERE  SCHEMA_NAME NOT IN (''information_schema'', ''performance_schema'', ''mysql'',''sakila'')
				ORDER BY SCHEMA_NAME') q ;
		OPEN @cur_x 
		FETCH NEXT FROM @cur_x into @dbname  
		WHILE @@FETCH_STATUS = 0  
		BEGIN  
			BEGIN
			SET @qry = 'INSERT INTO #tmp0	
				SELECT ''' + @dbname + ''', q.* FROM OPENROWSET(''MSDASQL''
				,''Driver={MySQL ODBC 5.1 Driver};Server=172.16.0.22;port=3306;Database=' + @dbname + ';User=your_user;Password=your_password;Option=3;''
				,'' select t0.your, t1.fields from this_table t0 
				join that_table t1 on t0.fk_id = t1.id  
				where cast(' + @pdate1 + ' as DATE) >= CAST(t0.from_dt as DATE)  and cast(' + @pdate2 + ' as DATE) <= CAST(t0.to_dt as DATE) '') q';
			EXEC(@qry);	
			END;
		  FETCH NEXT FROM @cur_x into @dbname  
	   END;  
	CLOSE @cur_x;  
	DEALLOCATE @cur_x;  
select * from #tmp0;
END


Then you would call the mssql stored proc in koolreport in your setup.
exec getRemoteData @pdate1 = '2019-03-14', @pdate2 = '2019-03-15' Hopefully you dont need to adjust any timeouts and your sql is fast because cursors unfortunately are not, but I coded this one to be optimized and run as quick as possible with forward only.

thx -d

Markus Mohr commented on Mar 15, 2019

Thank you very much, Andrew, I will chew that through until tomorrow and then get back to you.

I appreciate your help very much.

Andrew Borell commented on Mar 15, 2019

I made a few corrections since posting it, but I did test the code very quickly prior to ensure everything works conceptually. Best of luck!

Markus Mohr commented on Mar 16, 2019

Hello, Andrew,

I have tried to follow your construct but I am not able to handle this since I cannot make use of an MSSQL server within the environment of my client. I could set up one of my own, of course, but then I would have to leave the internally secured network of the hoster which is not a good idea to do since I am busy with patients' data. Thus, the data flow has to be properly limited and restricted to the current Linux-based setup. Of course, with some special instruments and investing a lot of extra work I could also implement an MSSQL server but that at my own expense.

I will try to follow your suggestion with the array of databases and then reiteration of PHP-based query result commands.

Once I will have the code figured out, I will post it here.

Andrew Borell commented on Mar 16, 2019

It should be pretty easy if you have a single set of credentials for all 50 databases. Probably pull it off with an array of database names, a loop with a "variable variable" to create datastores based on the database names array, and the php array_merge function. In koolreport there is an option for ->data that you should look at too. If the datasets are large you might not be happy with performance and could run into php memory issues.

Build Your Excellent Data Report

Let KoolReport help you to make great reports. It's free & open-source released under MIT license.

Download KoolReport View demo
None yet

None