Guru: IBM i Save File Compression Options
April 1, 2019 Michael Sansoterra
As I finished populating some test tables with a large volume of data on a small and transient IBM i partition in the cloud, I thought life was good. But my countenance fell as I realized the tables plus OS hogged over 70 percent of the disk space. I wondered how to get all the data into a single save file for safe keeping.
The buzzer in my mind was loud and clear: it ain’t gonna work, you don’t have enough room. As I loathed the thought of using multiple save files to save my test data, I remembered that most save file commands have a data compression (DTACPR) parameter. I never used it so I decided to try it with Save Library (SAVLIB) to see how well it worked. I executed SAVLIB with DTACPR(*HIGH) and I was pleased that the compression was good enough to let me save the entire test library with about 7 percent storage to spare on the system.
IBM offers three compression options (*LOW, *MEDIUM and *HIGH) and shown below is how IBM’s documentation describes each option (the emphasis is mine):
- *NO — No data compression is performed.
- *YES — If the save is to tape and the target device supports compression, hardware compression is performed. If compression is not supported, or if the save data is written to optical media or to a save file, software compression is performed. Low software compression is used for all devices except optical DVD, which uses medium software compression.
- *LOW — If the save operation is to a save file or optical, software data compression is performed with the SNA algorithm. Low compression is usually faster and the compressed data is usually larger than if medium or high compression is used.
- *MEDIUM — If the save operation is to a save file or optical, software data compression is performed with the TERSE algorithm. Medium compression is usually slower than low compression but faster than high compression. The compressed data is usually smaller than if low compression is used and larger than if high compression is used.
- *HIGH — If the save operation is to a save file or optical, software data compression is performed with the LZ1 algorithm. High compression is usually slower and the compressed data is usually smaller than if low or medium compression is used.
These are all older compression algorithms, and I had only heard of LZ1.
I decided to go back and compare the available compression options. I used the save object (SAVOBJ) command to save an 8GB CUSTOMER table into a save file as follows:
SAVOBJ OBJ(CUSTOMER) LIB(MYDATA) DEV(*SAVF) OBJTYPE(*FILE) SAVF(QGPL/MYSAVF) CLEAR(*REPLACE) DTACPR(*NO)
I cleared and re-used the same save file (SAVF) with each test. The results are shown in the table below with the variations of the data compression option:
|DTACPR Option||Avg CPU % Utilization||SAVOBJ Duration||SAVF size (bytes)||% of Original Size|
This test was done on a Power9 cloud partition running IBM i 7.3 with two vCPUs, 4GB of RAM and 200GB of disk.
The average CPU% utilization in the chart isn’t a high precision metric, it was basically me eye-balling the work with system activity (WRKSYSACT) command and watching the average CPU utilization over time. Even though the system wasn’t doing much besides these save tests, there is still some CPU cost to run everything. This machine varied between .5 percent and 1.5 percent while “idle”. The majority of the CPU was definitely due to the compression operation.
The chart demonstrates that it can be quite expensive in terms of CPU to request *HIGH or *MEDIUM compression levels, though admittedly this machine only had 2 vCPUs. Even so, you certainly would want to make sure your system has enough CPU capacity before running a save command (SAVnnn) with one of these compression options.
For my customer table, there wasn’t much space savings between *HIGH and *MEDIUM compression (only about .2 percent). While the *LOW option wasn’t as efficient in space savings (by about 8 percent compared to *HIGH), it performed the fastest out of all the methods. If time is of the essence, beware, as you can see the *HIGH and *MEDIUM options took quite a bit longer than a save without compression.
Of course your results may vary, depending on how conducive your data objects are to compression. Data with many repetitive elements typically compresses well. Admittedly, my test “CUSTOMER” table had a bunch of random characters in it. So, the odds are you may expect a better compression ratio for “normal” data.
I decided to do a secondary test to see how well “compressible” data such as a large plain text file would do. I downloaded the free list of Great Britain postal codes from the Geonames.org website. I unzipped it to /tmp/GB_full.txt on the IBM i and used the save object (SAV) command to save this text file from the IFS to a save file.
SAV DEV('/qsys.lib/qgpl.lib/mysavf.file') OBJ(('/tmp/GB_full.txt')) CLEAR(*REPLACE) DTACPR(*HIGH)
This chart contains the various save file sizes depending on the selected data compression (DTACPR):
|File Description||File Size (bytes)||% of Original Size|
|Zip file (original download)||13946129||8.0%|
|Save file-no compression||184705024||106.3%|
|Save file-high compression||18907136||10.9%|
|Save file-medium compression||32538624||18.7%|
|Save file-low compression||173039616||99.6%|
I did not include duration or CPU% for this test, because the elapsed time of the save operation wasn’t significant. I’m glad I did this test because this result is quite a bit different from the first test with respect to how well the various compression levels performed.
Zip compression was the clear winner compared to the IBM i’s older compression algorithms. Keep in mind, you can use the jar command using QSHELL for zipping/unzipping IFS files. If you don’t mind searching the internet, a number of utilities and other compression formats (including 7z and tar) can also be used from QSHELL to compress IFS files, if getting significant size reduction or sharing data without a save file is your primary goal. If needed, you could always place the zip file into a save file to have the best of both worlds!
Unlike the first compression demo, there was quite a bit of difference between the resulting file sizes for the different compression types. Whereas *LOW compression was quite useful in the prior test, with the plain text file *LOW accomplished almost nothing.
In conclusion, when saving data to a save file, it pays to experiment to gauge the cost (CPU utilization and duration) vs benefit (disk space savings) of using a particular data compression option. Don’t forget, the optimal settings will depend on your data set (for example, program objects vs table data and journal receivers, plain text data vs binary data, etc.) so remember to test for each variation. If you’re only concerned with compressing IFS data, then other compression options are available.