• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Guru: Use SQL To Find Duplicate Source Code

    March 12, 2018 Ted Holt

    According to Brian Tracy, “good habits are hard to develop but easy to live with; bad habits are easy to develop but hard to live with. The habits you have and the habits that have you will determine almost everything you achieve or fail to achieve.” This is as true in programming as in anything else we may do.

    Unfortunately, even those of us who strive for good work habits often have to follow the work of people who did not. One bad habit I come across occasionally is known in software engineering as WET solutions. WET stands for “write everything twice” or “we enjoy typing” or “waste everyone’s time.” The antidote is the DRY principle: “don’t repeat yourself.”

    Not long ago I had to modify a 13,000-line RPG program, the sort of thing that is beyond the capacity of my little brain to comprehend. I could tell there was repetition in the code, and how did I find it? I used SQL.

    It may seem strange to use SQL for source code, but source code is data. It’s output from a programmer and input to a compiler. Since it’s stored in source physical files, using SQL to query it — and even to modify it — is a cinch.

    A source physical file has three fields, which the Display File Field Description (DSPFFD) command will show you. They are: SRCSEQ (sequence number), SRCDAT (change date), and SRCDTA (source data). You will probably ignore the source date.

    To query a source member, create an alias. If you query the source physical file itself, you will access the first member, which is not the first member alphabetically, but the one that was added first. It will likely not be the member you want.

    create or replace alias qtemp.tempalias
     for somelib.somefile(somembr)
    

    In this example, I cleverly named the alias TEMPALIAS and put it in the QTEMP library. When you reference TEMPALIAS in an SQL statement, the database manager will access member SOMEMBR in source physical file SOMEFILE in library SOMELIB.

    Now let’s look for duplicate code.

    with source as 
       (select s.srcseq, s.srcdta
          from qtemp.tempalias as s
         where substr(s.srcdta,7,1) <> '*'
           and substr(s.srcdta,8  ) <> ' '
           and substr(s.srcdta,6,1) =  ' ')
     select a.srcseq, b.srcseq, a.srcdta, b.srcdta
       from source as a
       join source as b
         on trim(a.srcdta) = trim(b.srcdta)
        and a.srcseq < b.srcseq
    

    I began with a common table expression, SOURCE, to select the records that I want to include in the query. The important part of this expression is the WHERE clause, because that’s where you specify which lines of source code you want to include in the query. I remove blank lines and lines with an asterisk in column 7 and only blanks following it. In this example, I also included a line to select only rows with a blank in column 6, which in the 13,000-line program meant free-form calculations only. There WHERE clause varies widely depending on the type of source code you are analyzing — fixed-form RPG, free-form RPG, DDS, CL, etc. — and the preferences of the person or persons who wrote the code.

    In the main query, I joined the source member to itself, looking for lines that matched but with different sequence numbers. By selecting records where the sequence number in the primary file was less than the sequence number in the secondary file, I reduced the size of the result set and yet found the duplicate code I was looking for.

    In this example, I used the TRIM function in the join in case the person who copied the code from one spot in the program to another shifted the code. In other situations, you may not want to trim the blanks. Your query doesn’t have to be perfect — you’re not running your business on it. It only has to find duplicate code.

    Of course, you can also use SQL to look for code that was duplicated between two source members. You need only create two aliases.

    You use SQL to help other people do their work. Why not use SQL to help you do yours?

    RELATED STORY

    Don’t Repeat Yourself

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: 400guru, CL, DDS, FHG, Four Hundred Guru, Guru, IBM i, RPG, SQL

    Sponsored by
    WorksRight Software

    Do you need area code information?
    Do you need ZIP Code information?
    Do you need ZIP+4 information?
    Do you need city name information?
    Do you need county information?
    Do you need a nearest dealer locator system?

    We can HELP! We have affordable AS/400 software and data to do all of the above. Whether you need a simple city name retrieval system or a sophisticated CASS postal coding system, we have it for you!

    The ZIP/CITY system is based on 5-digit ZIP Codes. You can retrieve city names, state names, county names, area codes, time zones, latitude, longitude, and more just by knowing the ZIP Code. We supply information on all the latest area code changes. A nearest dealer locator function is also included. ZIP/CITY includes software, data, monthly updates, and unlimited support. The cost is $495 per year.

    PER/ZIP4 is a sophisticated CASS certified postal coding system for assigning ZIP Codes, ZIP+4, carrier route, and delivery point codes. PER/ZIP4 also provides county names and FIPS codes. PER/ZIP4 can be used interactively, in batch, and with callable programs. PER/ZIP4 includes software, data, monthly updates, and unlimited support. The cost is $3,900 for the first year, and $1,950 for renewal.

    Just call us and we’ll arrange for 30 days FREE use of either ZIP/CITY or PER/ZIP4.

    WorksRight Software, Inc.
    Phone: 601-856-8337
    Fax: 601-856-9432
    Email: software@worksright.com
    Website: www.worksright.com

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    As I See It: Homo Digitalis The Performance Impact Of Spectre And Meltdown

    Leave a Reply Cancel reply

TFH Volume: 28 Issue: 19

This Issue Sponsored By

  • New Generation Software
  • ASNA
  • Profound Logic Software
  • ARCAD Software
  • WorksRight Software

Table of Contents

  • HelpSystems Has A New Number One Investor
  • The Performance Impact Of Spectre And Meltdown
  • Guru: Use SQL To Find Duplicate Source Code
  • As I See It: Homo Digitalis
  • Proprietary Innovation: An IBM i Ace In The Hole

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Liam Allan Shares What’s Coming Next With Code For IBM i
  • From Stable To Scalable: Visual LANSA 16 Powers IBM i Growth – Launching July 8
  • VS Code Will Be The Heart Of The Modern IBM i Platform
  • The AS/400: A 37-Year-Old Dog That Loves To Learn New Tricks
  • IBM i PTF Guide, Volume 27, Number 25
  • Meet The Next Gen Of IBMers Helping To Build IBM i
  • Looks Like IBM Is Building A Linux-Like PASE For IBM i After All
  • Will Independent IBM i Clouds Survive PowerVS?
  • Now, IBM Is Jacking Up Hardware Maintenance Prices
  • IBM i PTF Guide, Volume 27, Number 24

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle