• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Extracting Sample Data From A DB2 for i Table

    November 3, 2015 Michael Sansoterra

    The need to retrieve a sample or subset of data from a database is a relatively common (and somewhat annoying) task. For example, DB2 for i developers often want to extract data and create a scaled down copy of a library (a.k.a. schema) to develop with. Additionally, sometimes code testing can be done using random data, such as creating a battery of orders to process with varying customers, order terms, etc. This tip won’t attempt to tackle the full breadth of either of these scenarios. However, the aim is to show how some simple queries can be used to create periodic or random extracts from DB2 tables.

    The code shown below was written and tested using DB2 for i 7.2 PTF Group Level 8 but should work on all versions of IBM i since V5R4.

    Scenario 1: Extract periodic data

    If you want a small subset of data for development, one way to achieve it is to select data with some sort of regular/periodic interval. The following query illustrates how to select every fourth row from the SalesOrderHeader table:

    WITH SalesData AS (
    SELECT h.*,ROW_NUMBER() OVER(ORDER BY SalesOrderId) AS ROW_ID
      FROM AdvWorks.SalesOrderHeader h
    )
    SELECT *
      FROM SalesData
     WHERE MOD(ROW_ID,4)=0
    

    The ROW_NUMBER function is used to assign a sequential number to the result set, and the MOD (modulus) function is used to filter every fourth row (every row with an ID evenly divisible by four). For the record, once you have the subset of the primary data (such as the order header table) the task of bringing the remaining related data (order lines, shipment info, etc.) from into the sample library is up to you; I have no magic bullet to offer here.

    If you happen to be one of the lucky few with a well-defined set of foreign key constraints on parent tables, another option is to copy the entire library, and then delete all rows from a parent table except every fourth row. Assuming a foreign key relationship exists with ON DELETE CASCADE specified, removing data from the parent table will also delete data from the related child tables so that the data reduction is accomplished in one simple step:

    /* SalesOrderDetail rows will be removed as well due to the FK */
    DELETE
      FROM AdvWorks.SalesOrderHeader h
     WHERE SalesOrderId NOT IN (
        SELECT SalesOrderId
          FROM (
             SELECT h.*,ROW_NUMBER() OVER(ORDER BY SalesOrderId) AS ROW_ID
               FROM AdvWorks.SalesOrderHeader h) Subset
          WHERE MOD(ROW_ID,4)=0
    )
    

    Almost any variation on a periodic extract can be done. For example, if you want to extract sample data containing the first two orders from every month, the following query will do the trick:

    WITH SalesData AS (
    SELECT h.*,
           ROW_NUMBER()
           OVER(PARTITION BY YEAR(OrderDate),MONTH(OrderDate) 
           ORDER BY OrderDate) AS Month_Order_Id
      FROM AdvWorks.SalesOrderHeader h
    )
    SELECT *
      FROM SalesData
     WHERE Month_Order_Id>=2
    

    In this case, the rows are numbered sequentially using ROW_NUMBER. The PARTITION BY causes the numbering to restart with each new month (MONTH_ORDER_ID column). This numbering scheme can be used to limit the results to only the first two rows of each month. The abridged results will look something like this:

    SalesOrderId

    OrderDate

    SubTotal

    Freight

    Month_Order_id

    49821

    4/1/2003

    94133.84

    2353.35

    1

    49822

    4/1/2003

    38594.20

    964.85

    2

    50189

    5/1/2003

    36341.76

    908.54

    1

    50190

    5/1/2003

    895.03

    22.38

    2

    50658

    6/1/2003

    7328.79

    183.22

    1

    50659

    6/1/2003

    1549.43

    38.74

    2

    51081

    7/1/2003

    44066.04

    1101.65

    1

    51082

    7/1/2003

    1009.73

    25.24

    2

    51690

    8/1/2003

    3765.73

    94.14

    1

    51691

    8/1/2003

    39027.77

    975.69

    2

    Scenario 2: Extract a random row

    Let’s say you’re writing a testing script to test code for a timesheet entry application. The testing requires the creation of several timesheet entries using a random male employee. How do you extract a random employee? It’s as easy as:

      SELECT c.* 
        FROM CorpData.Employee c
       WHERE SEX='M'
    ORDER BY 1
       FETCH FIRST 1 ROW ONLY
    

    The RAND() function will a random value between zero and one. In the case of DB2 for i, the RAND() function is non-deterministic meaning that the value it returns changes upon every invocation. In the above query, DB2 for i assigns a random number to every employee row, sorts the result set by the random number, and then returns the first one in the list. The result will most likely be a different employee every time the query is run.

    Note that the RAND() function may be deterministic on other database platforms such as SQL Server, meaning that the function returns the same value for every statement invocation. This is not the case in DB2 for i.

    Exercise caution when using this technique. The resulting query plan will perform a table scan and assign a random value to each row (I/O and CPU intensive), which may not be a pretty sight depending on the volume of data:

    For small tables in an occasional testing scenario, this type of code is likely insignificant. However, if run against a million-plus row table in frequently used production code, it would most likely tax your hardware.

    Other options for picking a random row from a table involve converting the RAND() function to an integer (or decimal) value and then using the integer to select a value from a sequential identity column or relative record number (using the RRN function). This technique requires that the random number generator know the limits of the identity column (or RRNs) and also assumes that there are not any gaps in the numbering.

    This example shows how to generate a random integer value between 1 and 100,000:

    SET IDENTITY_VAR=INT(RAND()*100000)+1;
    

    Applied to a SELECT statement, this example picks a random product with an ID between 700 and 999 (and assumes all IDs are present):

    SELECT *
      FROM ADVWORKS.PRODUCT
     WHERE PRODUCTID=INT(RAND()*299)+700
    

    Michael Sansoterra is a DBA for Broadway Systems in Grand Rapids, Michigan. Send your questions or comments for Mike via the IT Jungle Contact page.

    RELATED STORY

    New in V5R4: OLAP Ranking Specifications

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags:

    Sponsored by
    Midrange Dynamics North America

    Git up to speed with MDChange!

    Git can be lightning-fast when dealing with just a few hundred items in a repository. But when dealing with tens of thousands of items, transaction wait times can take minutes.

    MDChange offers an elegant solution that enables you to work efficiently any size Git repository while making your Git experience seamless and highly responsive.

    Learn more.

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Sponsored Links

    NGS:  Nov. 18 FREE Webinar: Migration Alternatives for Query/400 Users
    HelpSystems:  How do you use IBM i? Your peers want to know! Take the survey >
    Rocket Software:  Mobile app development and deployment solution for IBM i. Download FREE trial!

    First Open Source Conference For IBM i Shops Planned IBM i Executives: Where Are They Now?

    Leave a Reply Cancel reply

Volume 15, Number 23 -- November 3, 2015
THIS ISSUE SPONSORED BY:

WorksRight Software
Connectria
United Computer Group, Inc.

Table of Contents

  • Tuning SQL Environments On i
  • Why And How To Update The HMC
  • Extracting Sample Data From A DB2 for i Table

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Public Preview For Watson Code Assistant for i Available Soon
  • COMMON Youth Movement Continues at POWERUp 2025
  • IBM Preserves Memory Investments Across Power10 And Power11
  • Eradani Uses AI For New EDI And API Service
  • Picking Apart IBM’s $150 Billion In US Manufacturing And R&D
  • FAX/400 And CICS For i Are Dead. What Will IBM Kill Next?
  • Fresche Overhauls X-Analysis With Web UI, AI Smarts
  • Is It Time To Add The Rust Programming Language To IBM i?
  • Is IBM Going To Raise Prices On Power10 Expert Care?
  • IBM i PTF Guide, Volume 27, Number 20

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle