• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Guru: Use SQL to Generate Random Data

    June 17, 2019 Ted Holt

    Suppose I needed to generate a large database table with random data in order to adequately test the performance of an SQL query. Suppose that, for security reasons, I was not allowed to copy the production version of the table. Suppose that I needed a way to generate a lot — and I do mean a lot! — of random data. Suppose this scenario is not mere supposition.

    Before an SQL query goes into production, it should be tested against a production-like dataset. Running a query against a test dataset of 25 rows (records) can produce unpleasant surprises when it’s installed into production and forced to face a production dataset of thousands or even millions of rows. Obviously we can’t key that much data, so how do we generate it? We use SQL. Let me give you an example.

    Suppose you wish to test a query against a single table. (That’s not realistic, but it will keep the example small and I can illustrate the techniques.) You already have a copy of your table, and you just need to load data into it. However, since this is an illustration, here’s the statement to create my illustrative table.

    create table mylib.Cust24
      ( ID       dec(5),
        Name     char(20),
        Type     dec(1),
        Balance  dec(5,2),
        DueDate  date,
      primary key (ID));
    

    You need data for these columns (fields). You might just use a sequential number for ID, the primary key. If so, a recursive common table expression generates the sequence.

    with list (IDNbr) as 
      (values(1) 
        union all
       select IDNbr + 1
         from list
        where IDNbr < 25)
    select IDNbr from list
    

    This generates 25 rows with sequential ID numbers. You’d generate many more, of course.

    Now put random data into the other columns. The Type column should have an integer from 1 to 3.

    dec(rand()*3+1, 1,0)
    

    The rand() function generates a random number in the range zero to one. Multiplying that number by three changes it to a random number in the range zero to three. Adding one to that puts it into the range one to four. (Chances of the number being four are slim to none. I’ve never seen a random number generator return a zero or a one, only numbers in between them.) The dec() function removes the decimal places, leaving a whole number.

    The Balance column is a five-digit packed-decimal number. If you want a positive number, you can use an expression much like the one for the Type column.

    dec(rand()*999, 5, 2)
    

    If you want some negative numbers as well, adjust the expression to give you some.

    dec(rand()*1000, 5, 2)-500
    

    A random date isn’t hard to get. Use the rand() function to generate a number of days to adjust a reference date, such as the current date.

    current date + int(rand()*100) days
    

    This expression adds up to 100 days to the current date.

    I’ve saved the tricky part for last. You need to generate a random Name value. The rand() function only returns numeric values. How do you translate those numeric values into character strings? It’s slightly messy, but not difficult. Here’s one possibility.

    with list (IDNbr, RandName) as 
      (values(1,         dec(rand()*100000,5,0)) 
        union all
       select IDNbr + 1, dec(rand()*100000,5,0)
         from list
        where IDNbr < 25)
    select IDNbr,
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName       ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz', 
                  int(mod(RandName*0.1   ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName*0.01  ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName*0.001 ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName*0.0001,53)+1), 1)
    

    Let’s break that down a piece at a time.

    with list (IDNbr, RandName) as 
      (values(1,         dec(rand()*100000,5,0)) 
        union all
       select IDNbr + 1, dec(rand()*100000,5,0)
         from list
        where IDNbr < 25)
    

    RandName is a five-digit packed-decimal random number. You can use RandName to generate subscripts into an array of characters.

    substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
           int(mod(RandName,53)+1), 1)
    

    The mod() function divides RandName by 53, the number of characters in the literal, returning a number from zero to 52. Adding one gives me a valid subscript value to the string. Substr() returns one character from the literal. You have one random character, which is either a letter or space.

    You can generate more characters by dividing RandName by other powers of ten.

    substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz', 
               int(mod(RandName*0.1 ,53)+1), 1)
    substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
               int(mod(RandName*0.01,53)+1), 1)
    

    Here’s the entire query.

    with list (IDNbr, RandName) as 
      (values(1,         dec(rand()*100000,5,0)) 
        union all
       select IDNbr + 1, dec(rand()*100000,5,0)
         from list
        where IDNbr < 25)
    select IDNbr, 
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName       ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz', 
                  int(mod(RandName*0.1   ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName*0.01  ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName*0.001 ,53)+1), 1) concat
       substr('ABCDEFGHIJKLMNOPQRSTUVWYXZ abcdefghijklmnopqrstuvwxyz',
                  int(mod(RandName*0.0001,53)+1), 1) as Name,
       dec(rand()*3+1, 1,0) as Type,
       dec(rand()*1000, 5, 2) as Balance,
       current date + int(rand()*100) days as DueDate
      from list;
    

    Does it work? You be the judge.

    select * from cust24;
    

    I’ve included links to two articles from the IBM Knowledge Center for your benefit. (See the first two links under Related Stories.) They’re not for Db2 for i, but you may find them helpful. Be aware that the queries may not work on your system. One example generates character data by putting the mod() function in the second parameter of the translate() function, and Db2 for i does not allow that (yet).

    One shortcoming of using a large table of random data is that you may not be assured that all cases get tested. But you would have already done that test with a small dataset that contains all the data needed to fully exercise your code. The purpose of testing against random data is performance, not accuracy.

    RELATED STORIES

    How to populate a table with randomly generated test data

    Generate test data using SQL

    RAND

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: 400guru, DB2 for i, FHG, Four Hundred Guru, IBM i, SQL

    Sponsored by
    Rocket Software

    Meet digital age demands while maximizing your IT investment.

    Future-proof your mission-critical applications with Rocket® Solutions for IBM® i that keep your business ahead of the curve.

    Learn More

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Four Hundred Monitor, June 17 Meet IBM’s New Security Architect for IBM i

    One thought on “Guru: Use SQL to Generate Random Data”

    • R Lewis says:
      March 22, 2024 at 2:03 pm

      I have a list of about 8000 common names. I need a function to replace the names in a table with mock data. This solution cannot work in a function. NewID() and RAND() are not supported in functions.

      Reply

    Leave a Reply Cancel reply

TFH Volume: 29 Issue: 37

This Issue Sponsored By

  • ProData Computer Services
  • New Generation Software
  • CNX
  • MITEC
  • iTech Solutions

Table of Contents

  • Pricing Revealed For IBM i Slices On IBM Cloud
  • Meet IBM’s New Security Architect for IBM i
  • Guru: Use SQL to Generate Random Data
  • Four Hundred Monitor, June 17
  • IBM i PTF Guide, Volume 21, Number 24

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • POWERUp 2025 –Your Source For IBM i 7.6 Information
  • Maxava Consulting Services Does More Than HA/DR Project Management – A Lot More
  • Guru: Creating An SQL Stored Procedure That Returns A Result Set
  • As I See It: At Any Cost
  • IBM i PTF Guide, Volume 27, Number 19
  • IBM Unveils Manzan, A New Open Source Event Monitor For IBM i
  • Say Goodbye To Downtime: Update Your Database Without Taking Your Business Offline
  • i-Rays Brings Observability To IBM i Performance Problems
  • Another Non-TR “Technology Refresh” Happens With IBM i TR6
  • IBM i PTF Guide, Volume 27, Number 18

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle