• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Extracting Sample Data From A DB2 for i Table

    November 3, 2015 Michael Sansoterra

    The need to retrieve a sample or subset of data from a database is a relatively common (and somewhat annoying) task. For example, DB2 for i developers often want to extract data and create a scaled down copy of a library (a.k.a. schema) to develop with. Additionally, sometimes code testing can be done using random data, such as creating a battery of orders to process with varying customers, order terms, etc. This tip won’t attempt to tackle the full breadth of either of these scenarios. However, the aim is to show how some simple queries can be used to create periodic or random extracts from DB2 tables.

    The code shown below was written and tested using DB2 for i 7.2 PTF Group Level 8 but should work on all versions of IBM i since V5R4.

    Scenario 1: Extract periodic data

    If you want a small subset of data for development, one way to achieve it is to select data with some sort of regular/periodic interval. The following query illustrates how to select every fourth row from the SalesOrderHeader table:

    WITH SalesData AS (
    SELECT h.*,ROW_NUMBER() OVER(ORDER BY SalesOrderId) AS ROW_ID
      FROM AdvWorks.SalesOrderHeader h
    )
    SELECT *
      FROM SalesData
     WHERE MOD(ROW_ID,4)=0
    

    The ROW_NUMBER function is used to assign a sequential number to the result set, and the MOD (modulus) function is used to filter every fourth row (every row with an ID evenly divisible by four). For the record, once you have the subset of the primary data (such as the order header table) the task of bringing the remaining related data (order lines, shipment info, etc.) from into the sample library is up to you; I have no magic bullet to offer here.

    If you happen to be one of the lucky few with a well-defined set of foreign key constraints on parent tables, another option is to copy the entire library, and then delete all rows from a parent table except every fourth row. Assuming a foreign key relationship exists with ON DELETE CASCADE specified, removing data from the parent table will also delete data from the related child tables so that the data reduction is accomplished in one simple step:

    /* SalesOrderDetail rows will be removed as well due to the FK */
    DELETE
      FROM AdvWorks.SalesOrderHeader h
     WHERE SalesOrderId NOT IN (
        SELECT SalesOrderId
          FROM (
             SELECT h.*,ROW_NUMBER() OVER(ORDER BY SalesOrderId) AS ROW_ID
               FROM AdvWorks.SalesOrderHeader h) Subset
          WHERE MOD(ROW_ID,4)=0
    )
    

    Almost any variation on a periodic extract can be done. For example, if you want to extract sample data containing the first two orders from every month, the following query will do the trick:

    WITH SalesData AS (
    SELECT h.*,
           ROW_NUMBER()
           OVER(PARTITION BY YEAR(OrderDate),MONTH(OrderDate) 
           ORDER BY OrderDate) AS Month_Order_Id
      FROM AdvWorks.SalesOrderHeader h
    )
    SELECT *
      FROM SalesData
     WHERE Month_Order_Id>=2
    

    In this case, the rows are numbered sequentially using ROW_NUMBER. The PARTITION BY causes the numbering to restart with each new month (MONTH_ORDER_ID column). This numbering scheme can be used to limit the results to only the first two rows of each month. The abridged results will look something like this:

    SalesOrderId

    OrderDate

    SubTotal

    Freight

    Month_Order_id

    49821

    4/1/2003

    94133.84

    2353.35

    1

    49822

    4/1/2003

    38594.20

    964.85

    2

    50189

    5/1/2003

    36341.76

    908.54

    1

    50190

    5/1/2003

    895.03

    22.38

    2

    50658

    6/1/2003

    7328.79

    183.22

    1

    50659

    6/1/2003

    1549.43

    38.74

    2

    51081

    7/1/2003

    44066.04

    1101.65

    1

    51082

    7/1/2003

    1009.73

    25.24

    2

    51690

    8/1/2003

    3765.73

    94.14

    1

    51691

    8/1/2003

    39027.77

    975.69

    2

    Scenario 2: Extract a random row

    Let’s say you’re writing a testing script to test code for a timesheet entry application. The testing requires the creation of several timesheet entries using a random male employee. How do you extract a random employee? It’s as easy as:

      SELECT c.* 
        FROM CorpData.Employee c
       WHERE SEX='M'
    ORDER BY 1
       FETCH FIRST 1 ROW ONLY
    

    The RAND() function will a random value between zero and one. In the case of DB2 for i, the RAND() function is non-deterministic meaning that the value it returns changes upon every invocation. In the above query, DB2 for i assigns a random number to every employee row, sorts the result set by the random number, and then returns the first one in the list. The result will most likely be a different employee every time the query is run.

    Note that the RAND() function may be deterministic on other database platforms such as SQL Server, meaning that the function returns the same value for every statement invocation. This is not the case in DB2 for i.

    Exercise caution when using this technique. The resulting query plan will perform a table scan and assign a random value to each row (I/O and CPU intensive), which may not be a pretty sight depending on the volume of data:

    For small tables in an occasional testing scenario, this type of code is likely insignificant. However, if run against a million-plus row table in frequently used production code, it would most likely tax your hardware.

    Other options for picking a random row from a table involve converting the RAND() function to an integer (or decimal) value and then using the integer to select a value from a sequential identity column or relative record number (using the RRN function). This technique requires that the random number generator know the limits of the identity column (or RRNs) and also assumes that there are not any gaps in the numbering.

    This example shows how to generate a random integer value between 1 and 100,000:

    SET IDENTITY_VAR=INT(RAND()*100000)+1;
    

    Applied to a SELECT statement, this example picks a random product with an ID between 700 and 999 (and assumes all IDs are present):

    SELECT *
      FROM ADVWORKS.PRODUCT
     WHERE PRODUCTID=INT(RAND()*299)+700
    

    Michael Sansoterra is a DBA for Broadway Systems in Grand Rapids, Michigan. Send your questions or comments for Mike via the IT Jungle Contact page.

    RELATED STORY

    New in V5R4: OLAP Ranking Specifications

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags:

    Sponsored by
    WorksRight Software

    Do you need area code information?
    Do you need ZIP Code information?
    Do you need ZIP+4 information?
    Do you need city name information?
    Do you need county information?
    Do you need a nearest dealer locator system?

    We can HELP! We have affordable AS/400 software and data to do all of the above. Whether you need a simple city name retrieval system or a sophisticated CASS postal coding system, we have it for you!

    The ZIP/CITY system is based on 5-digit ZIP Codes. You can retrieve city names, state names, county names, area codes, time zones, latitude, longitude, and more just by knowing the ZIP Code. We supply information on all the latest area code changes. A nearest dealer locator function is also included. ZIP/CITY includes software, data, monthly updates, and unlimited support. The cost is $495 per year.

    PER/ZIP4 is a sophisticated CASS certified postal coding system for assigning ZIP Codes, ZIP+4, carrier route, and delivery point codes. PER/ZIP4 also provides county names and FIPS codes. PER/ZIP4 can be used interactively, in batch, and with callable programs. PER/ZIP4 includes software, data, monthly updates, and unlimited support. The cost is $3,900 for the first year, and $1,950 for renewal.

    Just call us and we’ll arrange for 30 days FREE use of either ZIP/CITY or PER/ZIP4.

    WorksRight Software, Inc.
    Phone: 601-856-8337
    Fax: 601-856-9432
    Email: software@worksright.com
    Website: www.worksright.com

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Sponsored Links

    NGS:  Nov. 18 FREE Webinar: Migration Alternatives for Query/400 Users
    HelpSystems:  How do you use IBM i? Your peers want to know! Take the survey >
    Rocket Software:  Mobile app development and deployment solution for IBM i. Download FREE trial!

    First Open Source Conference For IBM i Shops Planned IBM i Executives: Where Are They Now?

    Leave a Reply Cancel reply

Volume 15, Number 23 -- November 3, 2015
THIS ISSUE SPONSORED BY:

WorksRight Software
Connectria
United Computer Group, Inc.

Table of Contents

  • Tuning SQL Environments On i
  • Why And How To Update The HMC
  • Extracting Sample Data From A DB2 for i Table

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Liam Allan Shares What’s Coming Next With Code For IBM i
  • From Stable To Scalable: Visual LANSA 16 Powers IBM i Growth – Launching July 8
  • VS Code Will Be The Heart Of The Modern IBM i Platform
  • The AS/400: A 37-Year-Old Dog That Loves To Learn New Tricks
  • IBM i PTF Guide, Volume 27, Number 25
  • Meet The Next Gen Of IBMers Helping To Build IBM i
  • Looks Like IBM Is Building A Linux-Like PASE For IBM i After All
  • Will Independent IBM i Clouds Survive PowerVS?
  • Now, IBM Is Jacking Up Hardware Maintenance Prices
  • IBM i PTF Guide, Volume 27, Number 24

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle