Accessing Multiple DB2 Relational Databases In A Single Query
August 8, 2012 Michael Sansoterra
In my prior tip, I covered a new feature provided in the i7.1 technology refresh 4 update that allows a INSERT/SELECT statement to insert data into a local DB2 for i table from a query against a remote DB2 database. In this tip, I will cover a method that can be used to query multiple DB2 relational databases in a single statement.
Author’s Note: Incidentally, the remote DB2 database can be another member of the DB2 family besides DB2 for i. Check out my article on DB2 for Windows for more information.
As I mentioned in my last tip, with DB2 for i7.1, remote database access can be accomplished without using a CONNECT statement. Also, each SQL query against a remote database can access only one remote relational database at a time that can create a problem if you need to simultaneously access multiple remote databases with a single query.
The way to “weasel” around this restriction is to understand that user-defined table functions (UDTF) are an island unto themselves. A UDTF can be coded to access a single remote database. However, multiple UDTFs (each potentially accessing a different remote database) can be combined in a single query. In the end, several UDTFs (each accessing a single remote database) can be combined in a single SELECT statement that effectively accesses multiple remote databases at once.
Before we begin, this technique uses DRDA to connect the local and remote IBM i machines (or partitions.)
For a simple demo, say we have three IBM i partitions: a local system, SYSTEMA, and SYSTEMB. Each partition has one or more tables that we want to combine in a single query as shown in Table 1 below:
Table 1: Distribution of tables in sample heterogeneous query.
In case you couldn’t tell, I took sample tables from the AdventureWorks 2005 sample database and spread them out onto three partitions for this example.
To bring the Product data from SystemA to our local partition, we can create a UDTF on the local partition that references the Product table on the remote SystemA using the three-part naming convention:
CREATE OR REPLACE FUNCTION DEV.RMTPRODUCT() RETURNS TABLE (PRODUCTID INT, NAME NVARCHAR(50), PRODUCTNUMBER NVARCHAR(25), SAFETYSTOCKLEVEL INT) LANGUAGE SQL DISALLOW PARALLEL CARDINALITY 150 SET OPTION COMMIT=*NONE RETURN SELECT PRODUCTID,NAME, PRODUCTNUMBER,SAFETYSTOCKLEVEL FROM SYSTEMA.ADVWORKS.PRODUCT
If we run the table function, it returns the entire product table:
SELECT * FROM TABLE(DEV.RMTPRODUCT()) PRODUCT
Likewise, to retrieve the customer information from remote SystemB, we create a local UDTF that queries SystemB:
CREATE OR REPLACE FUNCTION DEV.RMTCUSTOMER() RETURNS TABLE (CUSTOMERID INT, ACCOUNTNUMBER NVARCHAR(32), CUSTOMERTYPE NCHAR(1), NAME NVARCHAR(60)) LANGUAGE SQL DISALLOW PARALLEL DETERMINISTIC CARDINALITY 5000 SET OPTION COMMIT=*NONE RETURN SELECT C.CUSTOMERID,C.ACCOUNTNUMBER,C.CUSTOMERTYPE, COALESCE(S.NAME,I.FIRSTNAME||' '||I.LASTNAME) AS NAME FROM SYSTEMB.ADVWORKS.CUSTOMER C LEFT JOIN SYSTEMB.ADVWORKS.STORE S ON S.CUSTOMERID=C.CUSTOMERID AND C.CUSTOMERTYPE='S' LEFT JOIN SYSTEMB.ADVWORKS.CONTACT I ON I.CONTACTID=C.CUSTOMERID AND C.CUSTOMERTYPE='I'
Finally, the local tables can now be combined with the remote data in a single statement as follows:
Combine three relational databases in a single SELECT CREATE TABLE QTEMP.CUSTOMER_ORDERS AS ( SELECT SOH.SALESORDERID,SOH.CUSTOMERID,SOH.ORDERDATE, C.NAME CUSTOMERNAME,SOD.PRODUCTID,P.NAME PRODUCTNAME, SOD.ORDERQTY,SOD.UNITPRICE FROM ADVWORKS.SALESORDERHEADER SOH -- Local JOIN TABLE(RMTCUSTOMER()) C ON C.CUSTOMERID=SOH.CUSTOMERID -- Remote SystemB JOIN ADVWORKS.SALESORDERDETAIL SOD ON SOD.SALESORDERID= SOH.SALESORDERID - Local JOIN TABLE(RMTPRODUCT()) P ON P.PRODUCTID=SOD.PRODUCTID -- Remote SystemA WHERE SOH.CustomerId=12314 ) WITH DATA
And there you have a DB2 for i query that accesses multiple DB2 relational databases.
A few things to consider: