• The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
Menu
  • The Four Hundred
  • Subscribe
  • Media Kit
  • Contributors
  • About Us
  • Contact
  • Guru: IBM i Save File Compression Options

    April 1, 2019 Michael Sansoterra

    As I finished populating some test tables with a large volume of data on a small and transient IBM i partition in the cloud, I thought life was good. But my countenance fell as I realized the tables plus OS hogged over 70 percent of the disk space. I wondered how to get all the data into a single save file for safe keeping.

    The buzzer in my mind was loud and clear: it ain’t gonna work, you don’t have enough room. As I loathed the thought of using multiple save files to save my test data, I remembered that most save file commands have a data compression (DTACPR) parameter. I never used it so I decided to try it with Save Library (SAVLIB) to see how well it worked. I executed SAVLIB with DTACPR(*HIGH) and I was pleased that the compression was good enough to let me save the entire test library with about 7 percent storage to spare on the system.

    IBM offers three compression options (*LOW, *MEDIUM and *HIGH) and shown below is how IBM’s documentation describes each option (the emphasis is mine):

    • *NO — No data compression is performed.
    • *YES — If the save is to tape and the target device supports compression, hardware compression is performed. If compression is not supported, or if the save data is written to optical media or to a save file, software compression is performed. Low software compression is used for all devices except optical DVD, which uses medium software compression.
    • *LOW — If the save operation is to a save file or optical, software data compression is performed with the SNA algorithm. Low compression is usually faster and the compressed data is usually larger than if medium or high compression is used.
    • *MEDIUM — If the save operation is to a save file or optical, software data compression is performed with the TERSE algorithm. Medium compression is usually slower than low compression but faster than high compression. The compressed data is usually smaller than if low compression is used and larger than if high compression is used.
    • *HIGH — If the save operation is to a save file or optical, software data compression is performed with the LZ1 algorithm. High compression is usually slower and the compressed data is usually smaller than if low or medium compression is used.

    These are all older compression algorithms, and I had only heard of LZ1.

    I decided to go back and compare the available compression options. I used the save object (SAVOBJ) command to save an 8GB CUSTOMER table into a save file as follows:

    SAVOBJ OBJ(CUSTOMER)
           LIB(MYDATA)
           DEV(*SAVF)
           OBJTYPE(*FILE)
           SAVF(QGPL/MYSAVF)
           CLEAR(*REPLACE)
           DTACPR(*NO)
    

    I cleared and re-used the same save file (SAVF) with each test. The results are shown in the table below with the variations of the data compression option:

    DTACPR Option Avg CPU % Utilization SAVOBJ Duration SAVF size (bytes) % of Original Size
    *NO 4% 5:41 8774656000
    *HIGH 40% 13:11 5687762944 64.8%
    *MEDIUM 33% 10:46 5701132288 65.0%
    *LOW 14% 3:42 6383755264 72.8%

    This test was done on a Power9 cloud partition running IBM i 7.3 with two vCPUs, 4GB of RAM and 200GB of disk.

    The average CPU% utilization in the chart isn’t a high precision metric, it was basically me eye-balling the work with system activity (WRKSYSACT) command and watching the average CPU utilization over time. Even though the system wasn’t doing much besides these save tests, there is still some CPU cost to run everything. This machine varied between .5 percent and 1.5 percent while “idle”. The majority of the CPU was definitely due to the compression operation.

    The chart demonstrates that it can be quite expensive in terms of CPU to request *HIGH or *MEDIUM compression levels, though admittedly this machine only had 2 vCPUs. Even so, you certainly would want to make sure your system has enough CPU capacity before running a save command (SAVnnn) with one of these compression options.

    For my customer table, there wasn’t much space savings between *HIGH and *MEDIUM compression (only about .2 percent). While the *LOW option wasn’t as efficient in space savings (by about 8 percent compared to *HIGH), it performed the fastest out of all the methods. If time is of the essence, beware, as you can see the *HIGH and *MEDIUM options took quite a bit longer than a save without compression.

    Of course your results may vary, depending on how conducive your data objects are to compression. Data with many repetitive elements typically compresses well. Admittedly, my test “CUSTOMER” table had a bunch of random characters in it. So, the odds are you may expect a better compression ratio for “normal” data.

    I decided to do a secondary test to see how well “compressible” data such as a large plain text file would do. I downloaded the free list of Great Britain postal codes from the Geonames.org website. I unzipped it to /tmp/GB_full.txt on the IBM i and used the save object (SAV) command to save this text file from the IFS to a save file.

    SAV DEV('/qsys.lib/qgpl.lib/mysavf.file')
        OBJ(('/tmp/GB_full.txt'))
        CLEAR(*REPLACE)
        DTACPR(*HIGH)
    

    This chart contains the various save file sizes depending on the selected data compression (DTACPR):

    File Description File Size (bytes) % of Original Size
    Uncompressed file 173821160
    Zip file (original download) 13946129 8.0%
    Save file-no compression 184705024 106.3%
    Save file-high compression 18907136 10.9%
    Save file-medium compression 32538624 18.7%
    Save file-low compression 173039616 99.6%

    I did not include duration or CPU% for this test, because the elapsed time of the save operation wasn’t significant. I’m glad I did this test because this result is quite a bit different from the first test with respect to how well the various compression levels performed.

    Zip compression was the clear winner compared to the IBM i’s older compression algorithms. Keep in mind, you can use the jar command using QSHELL for zipping/unzipping IFS files. If you don’t mind searching the internet, a number of utilities and other compression formats (including 7z and tar) can also be used from QSHELL to compress IFS files, if getting significant size reduction or sharing data without a save file is your primary goal. If needed, you could always place the zip file into a save file to have the best of both worlds!

    Unlike the first compression demo, there was quite a bit of difference between the resulting file sizes for the different compression types. Whereas *LOW compression was quite useful in the prior test, with the plain text file *LOW accomplished almost nothing.

    In conclusion, when saving data to a save file, it pays to experiment to gauge the cost (CPU utilization and duration) vs benefit (disk space savings) of using a particular data compression option. Don’t forget, the optimal settings will depend on your data set (for example, program objects vs table data and journal receivers, plain text data vs binary data, etc.) so remember to test for each variation. If you’re only concerned with compressing IFS data, then other compression options are available.

    RELATED STORY

    Save Object (SAVOBJ)

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Tags: Tags: 400guru, FHG, Four Hundred Guru, IBM i, IFS, Power9, qshell

    Sponsored by
    Focal Point Solutions Group

    Comprehensive Data Protection from Focal Point SG

    Your organization needs to be thinking differently about your backup & disaster recovery strategy

    Concerns of the Industry

    • Inefficient manual backup processes
    • Effectively storing data offsite
    • Developing and testing a concrete disaster recovery plan
    • Efficient access to data in a disaster scenario for necessary users
    • Risk of cyber security attack
    • Declining IT staff and resources

    The true cause of the above concerns is an organization’s status quo – 80% of IBM i users currently backup to tape and 40% of companies have no DR plan at all. Don’t wait for a disaster to take action.

    The new way to ensure cost-effective safety

    • Automated cloud backup
    • Two (2) remote sites – redundant storage, power, internet pipe, firewalls, etc.
    • Data encryption at all times – in-flight and at-rest
    • Fully managed remote hardware DR, including remote VPN access for necessary users
    • Regularly simulated phishing tests and cyber security training

    Potential “landmines” in solutions to avoid

    • Single point of storage – no redundancy
    • Misleading data analysis, compression/de-dup ratios, sizing of necessary computer resources for backup and DR
    • Large-scale cloud storage with difficult recovery
    • Inability to meet RTO/RPO

    Don’t get caught like the many organizations we’ve seen with inefficient exposed backup data and no DR plan!

    What VAULT400 has to offer

    Backup

    • Native software agent schedules backups to the Focal Point SG cloud based on your retention scheme
    • Client data is backed up to two data centers in US or two data centers in Canada
    • 256-bit AES encryption in-flight and at rest – only the client has the encryption key
    • Detailed data analysis to ensure proper sizing

    Disaster Recovery as a Service (DRaaS)

    • Focal Point SG provides “hands-off” DR – fully managed recovery
    • 60 days of remote VPN access available to unlimited users in event of a disaster
    • Documented reports to ensure defined SLAs are met

    Managed Service Cyber Security Training

    • Fully managed phishing tests
    • Detailed reporting of results
    • Fully managed administration of custom online cyber security training

    VAULT400 Cloud Backup & DRaaS is an IBM Server Proven Solution.

    VAULT400.com/proposal for FREE analysis & proposal

    813.513.7402 | ContactUs@FocalPointSg.com

    Share this:

    • Reddit
    • Facebook
    • LinkedIn
    • Twitter
    • Email

    Power Systems Not Getting 3D XPoint Memory Anytime Soon The Search For Intelligent Life Apparently Is Over

    3 thoughts on “Guru: IBM i Save File Compression Options”

    • John Tappin says:
      April 1, 2019 at 7:43 am

      It is worth considering that a smaller saved object can result in faster recovery times, and a lot less storage for multiple backup copies. Using compression is one answer.

      Choosing not to save access paths can also dramatically reduce save time, storage of the backup and therefore CPU time, albeit at the expense of a lot of extra time and CPU when restoring. This may be OK for a small test system.

      When dealing with recovery time is usually more of a precious resource than CPU in my experience though.

      Reply
    • Steven says:
      April 1, 2019 at 9:32 am

      Also, for the CL fans at 7.2 and later is CPYTOARCF and CPYFRMARCF for zipping and unzipping files.

      Reply
    • David Dolphin says:
      May 3, 2020 at 1:40 am

      I used CPYTOARCF on V7R3 and it zipped a library to the IFS. However, the CPYFRMARCF does not allow me to nominate a library to unzip the files. It gives an error “CPFA0A2 Information passed to this operation was not valid”.

      The “TODIR” parameter seems to only accept a directory name which is useless for restoring library objects.

      The help text for the command gives an example
      CPYFRMARCF FROMARCF(‘/MYDIR/MyArchiveFile.zip’)
      TODIR(‘/QSYS.LIB/MYLIB.LIB/’)
      RPLDTA(*YES)
      but doesn’t seem to want to do that. I cannot find any Google or IBM information on this problem other than comments from those who have the same problem.

      Reply

    Leave a Reply Cancel reply

TFH Volume: 29 Issue: 21

This Issue Sponsored By

  • iTech Solutions
  • WorksRight Software
  • COMMON
  • Computer Keyes
  • Manta Technologies

Table of Contents

  • What Vintage Is Your IBM i Wine?
  • The Search For Intelligent Life Apparently Is Over
  • Guru: IBM i Save File Compression Options
  • Power Systems Not Getting 3D XPoint Memory Anytime Soon
  • Traditional IT Spending Bests Cloud Infrastructure, For Now

Content archive

  • The Four Hundred
  • Four Hundred Stuff
  • Four Hundred Guru

Recent Posts

  • IBM i Has a Future ‘If Kept Up To Date,’ IDC Says
  • When You Need Us, We Are Ready To Do Grunt Work
  • Generative AI: Coming to an ERP Near You
  • Four Hundred Monitor, March 22
  • IBM i PTF Guide, Volume 25, Number 12
  • Unattended IBM i Operations Continue Upward Climb
  • VS Code Is The Full Stack IDE For IBM i
  • Domino Runs on IBM i 7.5, But HCL Still Working on Power10
  • Four Hundred Monitor, March 6
  • IBM i PTF Guide, Volume 25, Number 11

Subscribe

To get news from IT Jungle sent to your inbox every week, subscribe to our newsletter.

Pages

  • About Us
  • Contact
  • Contributors
  • Four Hundred Monitor
  • IBM i PTF Guide
  • Media Kit
  • Subscribe

Search

Copyright © 2023 IT Jungle