• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • A Fuzzy Search Algorithm

    August 30, 2002 Timothy Prickett Morgan

    Hey, Ted:

    Most inquiry programs require a user to enter the exact value or exact beginning portion of a value to be located. Unfortunately such programs do not find the desired data if the user doesn’t remember the exact spelling of the search term. I have an algorithm that allows searching by any portion of a database field, even if the user misspells the search argument.

    According to this algorithm, the position of each character of the input value is to be compared with the position of the same characters found in the corresponding field from the file. It calculates differences between positions of the matching characters.

    I assign a matching rate based on the difference in the number of positions. If a certain character is found in the same position of both the search string and the database field, I assign a matching rate of 100. If the two are one position away from each other, the rate is 90. Here is the complete scale:

    Absolute difference between character positions Matching rate
    0 100
    1 90
    2 80
    3 70
    4 60
    5 50
    6 40
    7 30
    8 20
    9 10
    10 or more 0

    These differences are converted into a single matching-rate value that indicates how much the search value and the field value from the file differ from one another.

    Consider, for example, a file that consists of two fields: a part number and part description. This file contains three records, as per this example:

    Part Number Part Description
    111 WIFE
    222 STOP
    333 KNIFE

    The user of the search program misspells KNIFE as NIVE and the search begins.

    For a search value of NIVE, these are the calculated matching rates:

    WIFE 200
    STOP 0
    KNIFE 300

    The more the search argument and database field value match, the higher the score.

    — Victor Pisman

    Searching a database is far from an exact science, and tools like wild-card matching and SQL’s SOUNDEX function are few and limited in what they can do.

    Victor’s source code is shown below. It consists of three objects–an input item master file, an output file to hold the results of the search, and an RPG program to perform the search.

    Victor, yours is an interesting algorithm. Thanks for sharing it with us.

    — Ted

    Input file IMIN

         A* IMIN  - Item Master File 
         A                             
         A          R FM$IM
         A            IMITNO        15
         A            IMDESC        15
         A          K IMITNO
    

    Output file IMOUT

         A                                      REF(IMIN)
         A          R FM$OUT
         A            XXITNO    R               REFFLD(IMITNO)
         A            XXDESC    R               REFFLD(IMDESC)
         A            XXRATE         3  0
         A          K XXRATE 
    

    Program IM

      * Fuzzy search algorithm
      * written by Victor Pisman
      *
     FIMIN    IF  E                    DISK
     FIMOUT   O   E                    DISK                      A
      *
     E                    AAA        15  1
     E                    BBB        15  3 0
      *
     I              'ABCDEFGHIJKLMNOPQRST-C         UC
     I              'UVWXYZ'
      *
     I              'abcdefghijklmnopqrst-C         LC
     I              'uvwxyz'
      *
     C           *ENTRY    PLIST
     C                     PARM           ##PARM 15
      *
     C           LC:UC     XLATE##PARM    ##PARM
      *
     C                     READ IMIN                     99
     C           *IN99     DOWEQ*OFF
     C                     EXSR SEARCH
     C                     Z-ADDTTRATE    XXRATE
     C                     MOVELIMITNO    XXITNO
     C                     MOVELIMDESC    XXDESC
     C                     WRITEFM$OUT
     C                     READ IMIN                     99
     C                     ENDDO
      *
     C                     MOVE *ON       *INLR
     C***********
     C           SEARCH    BEGSR
      ***********
      * Get the length of the search argument
     C                     MOVEL##PARM    ENTRY  15 P
     C           '  '      CHEKRENTRY     L       20
      * FILL THE ARRAY AAA WITH THE INPUT FIELD VALUE.
     C                     CLEARAAA
     C                     Z-ADD1         I       50
     C           1         DO   L         I
     C           1         SUBSTENTRY:I   AAA,I
     C                     ENDDO
      *
      * FILL THE ARRAY BBB.
     C                     CLEARBBB
     C                     Z-ADD*ZERO     BBBTOT
     C           *LIKE     DEFN BBB       BBBTOT+ 1
     C                     Z-ADD1         I
      *
      * CHECK FOR EXACT MATCH FIRST.
     C           ENTRY:L   SCAN IMDESC:1                 50
     C           *IN50     IFEQ *OFF                       NO EXACT MATCH
      *
     C           1         DO   L         I
     C           LC:UC     XLATEIMDESC    XXNAME
     C           *LIKE     DEFN IMDESC    XXNAME
     C           AAA,I     SCAN XXNAME    RATE    30     50
     C           *IN50     IFEQ *ON
     C           I         SUB  RATE      BBB,I
     C                     ADD  BBB,I     BBBTOT
     C                     ELSE
     C                     Z-ADD99        BBB,I
     C                     ENDIF
     C                     ENDDO
      *
     C                     ENDIF                           EXACT MATCH
      *
      * CALCULATE THE NUMBER OF LEADING BLANKS X IN THE INPUT FIELD.
     C           L         IFGT *ZERO
     C           BBBTOT    DIV  L         X       30H
     C                     ENDIF
      *
      * CALCULATE THE ABSOLUTE VALUE OF INDIVIDUAL DIFFERENCES.
     C                     Z-ADD1         I
     C           1         DO   L         I
     C           BBB,I     IFNE 99
     C                     SUB  X         BBB,I
     C                     ENDIF
     C           BBB,I     IFLT *ZERO
     C                     MULT -1        BBB,I
     C                     ENDIF
      *
     C           BBB,I     IFLT 10
     C                     MULT -10       BBB,I
     C                     ADD  100       BBB,I
     C                     ELSE
     C                     Z-ADD*ZERO     BBB,I
     C                     ENDIF
      *
     C                     ENDDO
      * CALCULATE TOTAL MATCHING RATE
     C                     XFOOTBBB       TTRATE  50
      *
     C                     ENDSR
    

    Sponsored By
    COMMON

    REGISTER FOR COMMON IN DENVER, OCT. 13-17

    Get the IT training you need by attending COMMON Users Group’s Fall 2002 IT Education Conference & Expo, October 13-17 in Denver. Early Bird registration is $1,150 until September 4.

    Choose from over 720 sessions and labs covering a wide range of industry topics. Also receive training from J.D. Edwards, MAPICS, and other vendors.

    Don’t miss out! Go to www.common.org

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: mgo_rc, Volume 2, Number 66 -- August 30, 2002

    Sponsored by
    WorksRight Software

    Do you need area code information?
    Do you need ZIP Code information?
    Do you need ZIP+4 information?
    Do you need city name information?
    Do you need county information?
    Do you need a nearest dealer locator system?

    We can HELP! We have affordable AS/400 software and data to do all of the above. Whether you need a simple city name retrieval system or a sophisticated CASS postal coding system, we have it for you!

    The ZIP/CITY system is based on 5-digit ZIP Codes. You can retrieve city names, state names, county names, area codes, time zones, latitude, longitude, and more just by knowing the ZIP Code. We supply information on all the latest area code changes. A nearest dealer locator function is also included. ZIP/CITY includes software, data, monthly updates, and unlimited support. The cost is $495 per year.

    PER/ZIP4 is a sophisticated CASS certified postal coding system for assigning ZIP Codes, ZIP+4, carrier route, and delivery point codes. PER/ZIP4 also provides county names and FIPS codes. PER/ZIP4 can be used interactively, in batch, and with callable programs. PER/ZIP4 includes software, data, monthly updates, and unlimited support. The cost is $3,900 for the first year, and $1,950 for renewal.

    Just call us and we’ll arrange for 30 days FREE use of either ZIP/CITY or PER/ZIP4.

    WorksRight Software, Inc.
    Phone: 601-856-8337
    Fax: 601-856-9432
    Email: software@worksright.com
    Website: www.worksright.com

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Edit with Parentheses in Query/400 Reader Feedback and Insights: Odds and Ends Always Popular

    Leave a Reply Cancel reply

MGO Volume: 2 Issue: 66

This Issue Sponsored By

    Table of Contents

    • Reader Feedback and Insights: Splitting a Qshell Variable
    • Adding Subprocedures to a Service Program
    • A Fuzzy Search Algorithm

    Content archive

    • The Four Hundred
    • Four Hundred Stuff
    • Four Hundred Guru

    Recent Posts

    • Liam Allan Shares What’s Coming Next With Code For IBM i
    • From Stable To Scalable: Visual LANSA 16 Powers IBM i Growth – Launching July 8
    • VS Code Will Be The Heart Of The Modern IBM i Platform
    • The AS/400: A 37-Year-Old Dog That Loves To Learn New Tricks
    • IBM i PTF Guide, Volume 27, Number 25
    • Meet The Next Gen Of IBMers Helping To Build IBM i
    • Looks Like IBM Is Building A Linux-Like PASE For IBM i After All
    • Will Independent IBM i Clouds Survive PowerVS?
    • Now, IBM Is Jacking Up Hardware Maintenance Prices
    • IBM i PTF Guide, Volume 27, Number 24

    Subscribe

    To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

    Pages

    • About Us
    • Contact
    • Contributors
    • Four Hundred Monitor
    • IBM i PTF Guide
    • Media Kit
    • Subscribe

    Search

    Copyright © 2025 IT Jungle