• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Guru: Use SQL To Find Duplicate Source Code

    March 12, 2018 Ted Holt

    According to Brian Tracy, “good habits are hard to develop but easy to live with; bad habits are easy to develop but hard to live with. The habits you have and the habits that have you will determine almost everything you achieve or fail to achieve.” This is as true in programming as in anything else we may do.

    Unfortunately, even those of us who strive for good work habits often have to follow the work of people who did not. One bad habit I come across occasionally is known in software engineering as WET solutions. WET stands for “write everything twice” or “we enjoy typing” or “waste everyone’s time.” The antidote is the DRY principle: “don’t repeat yourself.”

    Not long ago I had to modify a 13,000-line RPG program, the sort of thing that is beyond the capacity of my little brain to comprehend. I could tell there was repetition in the code, and how did I find it? I used SQL.

    It may seem strange to use SQL for source code, but source code is data. It’s output from a programmer and input to a compiler. Since it’s stored in source physical files, using SQL to query it — and even to modify it — is a cinch.

    A source physical file has three fields, which the Display File Field Description (DSPFFD) command will show you. They are: SRCSEQ (sequence number), SRCDAT (change date), and SRCDTA (source data). You will probably ignore the source date.

    To query a source member, create an alias. If you query the source physical file itself, you will access the first member, which is not the first member alphabetically, but the one that was added first. It will likely not be the member you want.

    create or replace alias qtemp.tempalias
     for somelib.somefile(somembr)
    

    In this example, I cleverly named the alias TEMPALIAS and put it in the QTEMP library. When you reference TEMPALIAS in an SQL statement, the database manager will access member SOMEMBR in source physical file SOMEFILE in library SOMELIB.

    Now let’s look for duplicate code.

    with source as 
       (select s.srcseq, s.srcdta
          from qtemp.tempalias as s
         where substr(s.srcdta,7,1) <> '*'
           and substr(s.srcdta,8  ) <> ' '
           and substr(s.srcdta,6,1) =  ' ')
     select a.srcseq, b.srcseq, a.srcdta, b.srcdta
       from source as a
       join source as b
         on trim(a.srcdta) = trim(b.srcdta)
        and a.srcseq < b.srcseq
    

    I began with a common table expression, SOURCE, to select the records that I want to include in the query. The important part of this expression is the WHERE clause, because that’s where you specify which lines of source code you want to include in the query. I remove blank lines and lines with an asterisk in column 7 and only blanks following it. In this example, I also included a line to select only rows with a blank in column 6, which in the 13,000-line program meant free-form calculations only. There WHERE clause varies widely depending on the type of source code you are analyzing — fixed-form RPG, free-form RPG, DDS, CL, etc. — and the preferences of the person or persons who wrote the code.

    In the main query, I joined the source member to itself, looking for lines that matched but with different sequence numbers. By selecting records where the sequence number in the primary file was less than the sequence number in the secondary file, I reduced the size of the result set and yet found the duplicate code I was looking for.

    In this example, I used the TRIM function in the join in case the person who copied the code from one spot in the program to another shifted the code. In other situations, you may not want to trim the blanks. Your query doesn’t have to be perfect — you’re not running your business on it. It only has to find duplicate code.

    Of course, you can also use SQL to look for code that was duplicated between two source members. You need only create two aliases.

    You use SQL to help other people do their work. Why not use SQL to help you do yours?

    RELATED STORY

    Don’t Repeat Yourself

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: 400guru, CL, DDS, FHG, Four Hundred Guru, Guru, IBM i, RPG, SQL

    Sponsored by
    Manta Technologies

    The Leader in IBM i Education!
    Need training on anything i?
    Manta is all you need.

    Courses and competency exams on:
    · IBM i operations
    · System Management and Security
    · IBM i Programming Tools
    · Programming in RPG, COBOL, CL, Java
    · Web Development

    SQL, DB2, QueryProduct features:
    · Runs in every popular browser
    · Available 24/7/365
    · Free Student Reference Guides
    · Free Student Administration
    · Concurrent User License
    · Built-In IBM i Simulator

    You can download our 200-page catalog and take sample sessions at MantaTech.com

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    As I See It: Homo Digitalis The Performance Impact Of Spectre And Meltdown

    Leave a Reply Cancel reply

TFH Volume: 28 Issue: 19

This Issue Sponsored By

  • New Generation Software
  • ASNA
  • Profound Logic Software
  • ARCAD Software
  • WorksRight Software

Table of Contents

  • HelpSystems Has A New Number One Investor
  • The Performance Impact Of Spectre And Meltdown
  • Guru: Use SQL To Find Duplicate Source Code
  • As I See It: Homo Digitalis
  • Proprietary Innovation: An IBM i Ace In The Hole

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • Bob 1.0 Users Bugged By Lack Of One Feature
  • Here Come The AI-Based Code Modernization Offerings
  • Guru: Cohesion First – What A Procedure Should Be Responsible For
  • IBM Offers Trade-Ins On Storage To Grease The Upgrade Skids
  • IBM i PTF Guide, Volume 28, Number 14
  • What IBM i Ideas Are Cooking In IBM’s Ideas Portal?
  • Early Bob Excels In Medhost IBM i Tryout
  • Counting The Cost Of AI Inference – And Projecting It Far Out
  • IBM i PTF Guide, Volume 28, Number 13
  • The Next Generation Of IBM i Talent in GenAI Action

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2025 IT Jungle