• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Extracting Sample Data From A DB2 for i Table

    November 3, 2015 Michael Sansoterra

    The need to retrieve a sample or subset of data from a database is a relatively common (and somewhat annoying) task. For example, DB2 for i developers often want to extract data and create a scaled down copy of a library (a.k.a. schema) to develop with. Additionally, sometimes code testing can be done using random data, such as creating a battery of orders to process with varying customers, order terms, etc. This tip won’t attempt to tackle the full breadth of either of these scenarios. However, the aim is to show how some simple queries can be used to create periodic or random extracts from DB2 tables.

    The code shown below was written and tested using DB2 for i 7.2 PTF Group Level 8 but should work on all versions of IBM i since V5R4.

    Scenario 1: Extract periodic data

    If you want a small subset of data for development, one way to achieve it is to select data with some sort of regular/periodic interval. The following query illustrates how to select every fourth row from the SalesOrderHeader table:

    WITH SalesData AS (
    SELECT h.*,ROW_NUMBER() OVER(ORDER BY SalesOrderId) AS ROW_ID
      FROM AdvWorks.SalesOrderHeader h
    )
    SELECT *
      FROM SalesData
     WHERE MOD(ROW_ID,4)=0
    

    The ROW_NUMBER function is used to assign a sequential number to the result set, and the MOD (modulus) function is used to filter every fourth row (every row with an ID evenly divisible by four). For the record, once you have the subset of the primary data (such as the order header table) the task of bringing the remaining related data (order lines, shipment info, etc.) from into the sample library is up to you; I have no magic bullet to offer here.

    If you happen to be one of the lucky few with a well-defined set of foreign key constraints on parent tables, another option is to copy the entire library, and then delete all rows from a parent table except every fourth row. Assuming a foreign key relationship exists with ON DELETE CASCADE specified, removing data from the parent table will also delete data from the related child tables so that the data reduction is accomplished in one simple step:

    /* SalesOrderDetail rows will be removed as well due to the FK */
    DELETE
      FROM AdvWorks.SalesOrderHeader h
     WHERE SalesOrderId NOT IN (
        SELECT SalesOrderId
          FROM (
             SELECT h.*,ROW_NUMBER() OVER(ORDER BY SalesOrderId) AS ROW_ID
               FROM AdvWorks.SalesOrderHeader h) Subset
          WHERE MOD(ROW_ID,4)=0
    )
    

    Almost any variation on a periodic extract can be done. For example, if you want to extract sample data containing the first two orders from every month, the following query will do the trick:

    WITH SalesData AS (
    SELECT h.*,
           ROW_NUMBER()
           OVER(PARTITION BY YEAR(OrderDate),MONTH(OrderDate) 
           ORDER BY OrderDate) AS Month_Order_Id
      FROM AdvWorks.SalesOrderHeader h
    )
    SELECT *
      FROM SalesData
     WHERE Month_Order_Id>=2
    

    In this case, the rows are numbered sequentially using ROW_NUMBER. The PARTITION BY causes the numbering to restart with each new month (MONTH_ORDER_ID column). This numbering scheme can be used to limit the results to only the first two rows of each month. The abridged results will look something like this:

    SalesOrderId

    OrderDate

    SubTotal

    Freight

    Month_Order_id

    49821

    4/1/2003

    94133.84

    2353.35

    1

    49822

    4/1/2003

    38594.20

    964.85

    2

    50189

    5/1/2003

    36341.76

    908.54

    1

    50190

    5/1/2003

    895.03

    22.38

    2

    50658

    6/1/2003

    7328.79

    183.22

    1

    50659

    6/1/2003

    1549.43

    38.74

    2

    51081

    7/1/2003

    44066.04

    1101.65

    1

    51082

    7/1/2003

    1009.73

    25.24

    2

    51690

    8/1/2003

    3765.73

    94.14

    1

    51691

    8/1/2003

    39027.77

    975.69

    2

    Scenario 2: Extract a random row

    Let’s say you’re writing a testing script to test code for a timesheet entry application. The testing requires the creation of several timesheet entries using a random male employee. How do you extract a random employee? It’s as easy as:

      SELECT c.* 
        FROM CorpData.Employee c
       WHERE SEX='M'
    ORDER BY 1
       FETCH FIRST 1 ROW ONLY
    

    The RAND() function will a random value between zero and one. In the case of DB2 for i, the RAND() function is non-deterministic meaning that the value it returns changes upon every invocation. In the above query, DB2 for i assigns a random number to every employee row, sorts the result set by the random number, and then returns the first one in the list. The result will most likely be a different employee every time the query is run.

    Note that the RAND() function may be deterministic on other database platforms such as SQL Server, meaning that the function returns the same value for every statement invocation. This is not the case in DB2 for i.

    Exercise caution when using this technique. The resulting query plan will perform a table scan and assign a random value to each row (I/O and CPU intensive), which may not be a pretty sight depending on the volume of data:

    For small tables in an occasional testing scenario, this type of code is likely insignificant. However, if run against a million-plus row table in frequently used production code, it would most likely tax your hardware.

    Other options for picking a random row from a table involve converting the RAND() function to an integer (or decimal) value and then using the integer to select a value from a sequential identity column or relative record number (using the RRN function). This technique requires that the random number generator know the limits of the identity column (or RRNs) and also assumes that there are not any gaps in the numbering.

    This example shows how to generate a random integer value between 1 and 100,000:

    SET IDENTITY_VAR=INT(RAND()*100000)+1;
    

    Applied to a SELECT statement, this example picks a random product with an ID between 700 and 999 (and assumes all IDs are present):

    SELECT *
      FROM ADVWORKS.PRODUCT
     WHERE PRODUCTID=INT(RAND()*299)+700
    

    Michael Sansoterra is a DBA for Broadway Systems in Grand Rapids, Michigan. Send your questions or comments for Mike via the IT Jungle Contact page.

    RELATED STORY

    New in V5R4: OLAP Ranking Specifications

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags:

    Sponsored by
    ProData

    Simple.  Reliable.  Powerful.
    IBM i software to help you navigate your universe!

    Use these tools to get where you are going:

    DBU to easily, yet securely, access the data on your IBM i or remote system.

    RDB Connect for powerful record level access to remote data using RPG, CL or COBOL.

    SQL/Pro and Audit for reliable SQL data processing, tracking and reporting.

    And more!

    Join our DBUniverse of loyal, happy customers and experience your destination…

    Do IT with DBU!

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Sponsored Links

    NGS:  Nov. 18 FREE Webinar: Migration Alternatives for Query/400 Users
    HelpSystems:  How do you use IBM i? Your peers want to know! Take the survey >
    Rocket Software:  Mobile app development and deployment solution for IBM i. Download FREE trial!

    First Open Source Conference For IBM i Shops Planned IBM i Executives: Where Are They Now?

    Leave a Reply Cancel reply

Volume 15, Number 23 -- November 3, 2015
THIS ISSUE SPONSORED BY:

WorksRight Software
Connectria
United Computer Group, Inc.

Table of Contents

  • Tuning SQL Environments On i
  • Why And How To Update The HMC
  • Extracting Sample Data From A DB2 for i Table

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • IBM Extends Dynamic Capacity Pricing Scheme To Its Cloud
  • Here’s What You Should Do About The IBM i Skills Shortage
  • Matillion Founder Recounts Midrange Roots
  • Four Hundred Monitor, February 24
  • IBM i PTF Guide, Volume 23, Number 8
  • iTech Solutions Keeps You In The Know With VERIFi
  • Tech Data’s Take On Certified Pre-Owned IT Gear
  • Guru: Prompting Stored Procedures
  • As I See It: Sunshine Or Oxygen?
  • Looking For Some Insight On IBM i Security

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2021 IT Jungle

loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.