• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Guru: Use SQL to Generate Random Data

    June 17, 2019 Ted Holt

    Suppose I needed to generate a large database table with random data in order to adequately test the performance of an SQL query. Suppose that, for security reasons, I was not allowed to copy the production version of the table. Suppose that I needed a way to generate a lot — and I do mean a lot! — of random data. Suppose this scenario is not mere supposition.

    Before an SQL query goes into production, it should be tested against a production-like dataset. Running a query against a test dataset of 25 rows (records) can produce unpleasant surprises when it’s installed into production and forced to face a production dataset of thousands or even millions of rows. Obviously we can’t key that much data, so how do we generate it? We use SQL. Let me give you an example.

    Suppose you wish to test a query against a single table. (That’s not realistic, but it will keep the example small and I can illustrate the techniques.) You already have a copy of your table, and you just need to load data into it. However, since this is an illustration, here’s the statement to create my illustrative table.

    create table mylib.Cust24
      ( ID       dec(5),
        Name     char(20),
        Type     dec(1),
        Balance  dec(5,2),
        DueDate  date,
      primary key (ID));
    

    You need data for these columns (fields). You might just use a sequential number for ID, the primary key. If so, a recursive common table expression generates the sequence.

    with list (IDNbr) as 
      (values(1) 
        union all
       select IDNbr + 1
         from list
        where IDNbr < 25)
    select IDNbr from list
    

    This generates 25 rows with sequential ID numbers. You’d generate many more, of course.

    Now put random data into the other columns. The Type column should have an integer from 1 to 3.

    dec(rand()*3+1, 1,0)
    

    The rand() function generates a random number in the range zero to one. Multiplying that number by three changes it to a random number in the range zero to three. Adding one to that puts it into the range one to four. (Chances of the number being four are slim to none. I’ve never seen a random number generator return a zero or a one, only numbers in between them.) The dec() function removes the decimal places, leaving a whole number.

    The Balance column is a five-digit packed-decimal number. If you want a positive number, you can use an expression much like the one for the Type column.

    dec(rand()*999, 5, 2)
    

    If you want some negative numbers as well, adjust the expression to give you some.

    dec(rand()*1000, 5, 2)-500
    

    A random date isn’t hard to get. Use the rand() function to generate a number of days to adjust a reference date, such as the current date.

    current date + int(rand()*100) days
    

    This expression adds up to 100 days to the current date.

    I’ve saved the tricky part for last. You need to generate a random Name value. The rand() function only returns numeric values. How do you translate those numeric values into character strings? It’s slightly messy, but not difficult. Here’s one possibility.

    with list (IDNbr, RandName) as 
      (values(1,         dec(rand()*100000,5,0)) 
        union all
       select IDNbr + 1, dec(rand()*100000,5,0)
         from list
        where IDNbr < 25)
    select IDNbr,
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName       ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz', 
                  int(mod(RandName*0.1   ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName*0.01  ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName*0.001 ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName*0.0001,53)+1), 1)
    

    Let’s break that down a piece at a time.

    with list (IDNbr, RandName) as 
      (values(1,         dec(rand()*100000,5,0)) 
        union all
       select IDNbr + 1, dec(rand()*100000,5,0)
         from list
        where IDNbr < 25)
    

    RandName is a five-digit packed-decimal random number. You can use RandName to generate subscripts into an array of characters.

    substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
           int(mod(RandName,53)+1), 1)
    

    The mod() function divides RandName by 53, the number of characters in the literal, returning a number from zero to 52. Adding one gives me a valid subscript value to the string. Substr() returns one character from the literal. You have one random character, which is either a letter or space.

    You can generate more characters by dividing RandName by other powers of ten.

    substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz', 
               int(mod(RandName*0.1 ,53)+1), 1)
    substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
               int(mod(RandName*0.01,53)+1), 1)
    

    Here’s the entire query.

    with list (IDNbr, RandName) as 
      (values(1,         dec(rand()*100000,5,0)) 
        union all
       select IDNbr + 1, dec(rand()*100000,5,0)
         from list
        where IDNbr < 25)
    select IDNbr, 
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName       ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz', 
                  int(mod(RandName*0.1   ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName*0.01  ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName*0.001 ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName*0.0001,53)+1), 1) as Name,
       dec(rand()*3+1, 1,0) as Type,
       dec(rand()*1000, 5, 2) as Balance,
       current date + int(rand()*100) days as DueDate
      from list;
    

    Does it work? You be the judge.

    select * from cust24;
    

    I’ve included links to two articles from the IBM Knowledge Center for your benefit. (See the first two links under Related Stories.) They’re not for Db2 for i, but you may find them helpful. Be aware that the queries may not work on your system. One example generates character data by putting the mod() function in the second parameter of the translate() function, and Db2 for i does not allow that (yet).

    One shortcoming of using a large table of random data is that you may not be assured that all cases get tested. But you would have already done that test with a small dataset that contains all the data needed to fully exercise your code. The purpose of testing against random data is performance, not accuracy.

    RELATED STORIES

    How to populate a table with randomly generated test data

    Generate test data using SQL

    RAND

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: 400guru, DB2 for i, FHG, Four Hundred Guru, IBM i, SQL

    Sponsored by
    DRV Technologies, Inc.

    Get More from Your IBM i

    Many users today struggle to get at the data they need on the IBM i. When users get reports, they look like they were formatted some time last century.

    Some organizations are still printing pre-printed forms and checks on impact printers.

    How often do operators log on to their system to look for messages they hope they don’t find?

    All of these scenarios can affect users’ perception of the IBM platform negatively, but there are simple solutions.

    DRV Technologies Inc. develops innovative solutions that help customers get more from their IBM i systems.

    Solutions include:

    • SpoolFlex spool conversion & distribution
    • FormFlex electronic forms
    • SecureChex MICR laser check printing
    • MessageFlex system monitoring

    FlexTools streamline resources, improve efficiency and enable pro-active system management.

    Better software, better service, DRV Tech.

    Learn how you can get more from your IBM i at www.drvtech.com

    Call 866 378-3366 for a Free Demonstration

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Four Hundred Monitor, June 17 Meet IBM’s New Security Architect for IBM i

    Leave a Reply Cancel reply

TFH Volume: 29 Issue: 37

This Issue Sponsored By

  • ProData Computer Services
  • New Generation Software
  • CNX
  • MITEC
  • iTech Solutions

Table of Contents

  • Pricing Revealed For IBM i Slices On IBM Cloud
  • Meet IBM’s New Security Architect for IBM i
  • Guru: Use SQL to Generate Random Data
  • Four Hundred Monitor, June 17
  • IBM i PTF Guide, Volume 21, Number 24

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Security Still Top Concern, IBM i Marketplace Study Says
  • Bob Langieri Shares IBM i Career Trends Outlook for 2023
  • Kisco Brings Native SMS Messaging to IBM i
  • Four Hundred Monitor, February 1
  • 2023 IBM i Predictions, Part 4
  • Power Systems Did Indeed Grow Revenues Last Year
  • The IBM Power Trap: Three Mistakes That Leave You Stuck
  • Big Blue Decrees Its 2023 IBM Champions
  • As I See It: The Good, the Bad, And The Mistaken
  • IBM i PTF Guide, Volume 25, Number 5

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2022 IT Jungle

loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.