• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Guru: Use SQL to Generate Random Data

    June 17, 2019 Ted Holt

    Suppose I needed to generate a large database table with random data in order to adequately test the performance of an SQL query. Suppose that, for security reasons, I was not allowed to copy the production version of the table. Suppose that I needed a way to generate a lot — and I do mean a lot! — of random data. Suppose this scenario is not mere supposition.

    Before an SQL query goes into production, it should be tested against a production-like dataset. Running a query against a test dataset of 25 rows (records) can produce unpleasant surprises when it’s installed into production and forced to face a production dataset of thousands or even millions of rows. Obviously we can’t key that much data, so how do we generate it? We use SQL. Let me give you an example.

    Suppose you wish to test a query against a single table. (That’s not realistic, but it will keep the example small and I can illustrate the techniques.) You already have a copy of your table, and you just need to load data into it. However, since this is an illustration, here’s the statement to create my illustrative table.

    create table mylib.Cust24
      ( ID       dec(5),
        Name     char(20),
        Type     dec(1),
        Balance  dec(5,2),
        DueDate  date,
      primary key (ID));
    

    You need data for these columns (fields). You might just use a sequential number for ID, the primary key. If so, a recursive common table expression generates the sequence.

    with list (IDNbr) as 
      (values(1) 
        union all
       select IDNbr + 1
         from list
        where IDNbr < 25)
    select IDNbr from list
    

    This generates 25 rows with sequential ID numbers. You’d generate many more, of course.

    Now put random data into the other columns. The Type column should have an integer from 1 to 3.

    dec(rand()*3+1, 1,0)
    

    The rand() function generates a random number in the range zero to one. Multiplying that number by three changes it to a random number in the range zero to three. Adding one to that puts it into the range one to four. (Chances of the number being four are slim to none. I’ve never seen a random number generator return a zero or a one, only numbers in between them.) The dec() function removes the decimal places, leaving a whole number.

    The Balance column is a five-digit packed-decimal number. If you want a positive number, you can use an expression much like the one for the Type column.

    dec(rand()*999, 5, 2)
    

    If you want some negative numbers as well, adjust the expression to give you some.

    dec(rand()*1000, 5, 2)-500
    

    A random date isn’t hard to get. Use the rand() function to generate a number of days to adjust a reference date, such as the current date.

    current date + int(rand()*100) days
    

    This expression adds up to 100 days to the current date.

    I’ve saved the tricky part for last. You need to generate a random Name value. The rand() function only returns numeric values. How do you translate those numeric values into character strings? It’s slightly messy, but not difficult. Here’s one possibility.

    with list (IDNbr, RandName) as 
      (values(1,         dec(rand()*100000,5,0)) 
        union all
       select IDNbr + 1, dec(rand()*100000,5,0)
         from list
        where IDNbr < 25)
    select IDNbr,
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName       ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz', 
                  int(mod(RandName*0.1   ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName*0.01  ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName*0.001 ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName*0.0001,53)+1), 1)
    

    Let’s break that down a piece at a time.

    with list (IDNbr, RandName) as 
      (values(1,         dec(rand()*100000,5,0)) 
        union all
       select IDNbr + 1, dec(rand()*100000,5,0)
         from list
        where IDNbr < 25)
    

    RandName is a five-digit packed-decimal random number. You can use RandName to generate subscripts into an array of characters.

    substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
           int(mod(RandName,53)+1), 1)
    

    The mod() function divides RandName by 53, the number of characters in the literal, returning a number from zero to 52. Adding one gives me a valid subscript value to the string. Substr() returns one character from the literal. You have one random character, which is either a letter or space.

    You can generate more characters by dividing RandName by other powers of ten.

    substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz', 
               int(mod(RandName*0.1 ,53)+1), 1)
    substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
               int(mod(RandName*0.01,53)+1), 1)
    

    Here’s the entire query.

    with list (IDNbr, RandName) as 
      (values(1,         dec(rand()*100000,5,0)) 
        union all
       select IDNbr + 1, dec(rand()*100000,5,0)
         from list
        where IDNbr < 25)
    select IDNbr, 
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName       ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz', 
                  int(mod(RandName*0.1   ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName*0.01  ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName*0.001 ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName*0.0001,53)+1), 1) as Name,
       dec(rand()*3+1, 1,0) as Type,
       dec(rand()*1000, 5, 2) as Balance,
       current date + int(rand()*100) days as DueDate
      from list;
    

    Does it work? You be the judge.

    select * from cust24;
    

    I’ve included links to two articles from the IBM Knowledge Center for your benefit. (See the first two links under Related Stories.) They’re not for Db2 for i, but you may find them helpful. Be aware that the queries may not work on your system. One example generates character data by putting the mod() function in the second parameter of the translate() function, and Db2 for i does not allow that (yet).

    One shortcoming of using a large table of random data is that you may not be assured that all cases get tested. But you would have already done that test with a small dataset that contains all the data needed to fully exercise your code. The purpose of testing against random data is performance, not accuracy.

    RELATED STORIES

    How to populate a table with randomly generated test data

    Generate test data using SQL

    RAND

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: 400guru, DB2 for i, FHG, Four Hundred Guru, IBM i, SQL

    Sponsored by
    TL Ashford

    TL Ashford writes software to generate Barcode Labels and Forms for the IBM i.

    Our software products are the most feature-rich, cost-effective solutions available!

    TLAForms converts traditional IBM i spool files into high quality, professional forms that are output as PDF documents. A form is associated with a spool file and the form is designed. TLAForms then ‘watches’ the IBM i output queue for a new spool file to be generated. When a new spool file is generated, TLAForms reads the data from the spool file, places the data on the form, and outputs a PDF document. The PDF documents can be archived to the IFS, automatically emailed, and printed.

    Features:

    • Select Data directly from an IBM i Spool File
    • Burst IBM i Spool Files based on page or Spool File data
    • Add spool file data to form as Text using a wide variety of fonts and colors (the MICR font for printing checks is included in the software)
    • Add spool file data to form as bar code – including 2-dimensional bar codes PDF-417 and QRCode
    • Configure SQL statements to retrieve and use IBM i File data on forms
    • Utilize Actions to show or hide objects on the form based on data within the spool file
    • Import Color Graphics
    • Use Color Overlays
    • Create Tables
    • Forms can be archived to the IFS
    • Forms can be emailed automatically
    • Forms can be printed to any IBM i, Network or Windows printer
    • Forms are automatically generated when a new spool file is generated in the IBM i output queue
    • NO PROGRAMMING required
    • On-Line Video Training Library

    Learn more about TLAForms at www.tlashford.com/TLA2/pages/tlaforms/overview.html

    Barcode400 is native IBM i software to design and print labels directly from your IBM i in minutes! Compliance and RFID labeling is easy using Barcode400’s tools and templates.

    Features:

    • Software resides on the IBM i
    • IBM i security and Backup
    • Labels are centrally located on the IBM i
    • Label formats are available to all users the instant they are saved – even in remote facilities
    • GUI designer (Unlimited Users)
    • Generate Labels as PDF documents!
    • Print to 100’s of thermal transfer printers
    • Print to HP and compatible printers
    • Print labels interactively – No Programming Necessary!

    OR Integrate into existing application programs to automatically print labels – Barcode400 has been integrated with nearly every ERP and WMS software available on the IBM i, including thousands of in-house written applications.

    • On-Line Video Training Library
    • Free Compliance Labels
    • Generate Checks using the MICR font
    • RFID Support (optional)
    • History / Reprint Utility
    • Integration Assistant
    • Low Cost (no tiered pricing)

    Learn more about Barcode400 at www.tlashford.com/TLA2/pages/bc400labels/overview.html

     

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Four Hundred Monitor, June 17 Meet IBM’s New Security Architect for IBM i

    Leave a Reply Cancel reply

TFH Volume: 29 Issue: 37

This Issue Sponsored By

  • ProData Computer Services
  • New Generation Software
  • CNX
  • MITEC
  • iTech Solutions

Table of Contents

  • Pricing Revealed For IBM i Slices On IBM Cloud
  • Meet IBM’s New Security Architect for IBM i
  • Guru: Use SQL to Generate Random Data
  • Four Hundred Monitor, June 17
  • IBM i PTF Guide, Volume 21, Number 24

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Doing The Texas Two Step From Power9 To Power10
  • PHP’s Legacy Problem
  • Guru: For IBM i Newcomers, An Access Client Solutions Primer
  • IBM i 7.1 Extended Out To 2024 And Up To The IBM Cloud
  • Some Practical Advice On That HMC-Power9 Impedance Mismatch
  • IBM Extends Dynamic Capacity Pricing Scheme To Its Cloud
  • Here’s What You Should Do About The IBM i Skills Shortage
  • Matillion Founder Recounts Midrange Roots
  • Four Hundred Monitor, February 24
  • IBM i PTF Guide, Volume 23, Number 8

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2021 IT Jungle

loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.