|
Easily Calculating Statistical Functions
Published: December 6, 2006
Hey, Ted:
The code associated with this article is available for download.
Would you be interested in publishing a program that calculates statistical values, such as average, standard deviation, maximum value, and minimum value? I have such a program, which I use in to calculate average usage and standard deviation of the usage of the items we handle.
--Victor Pisman
Thanks to faithful reader Victor Pisman for an interesting program. When I think of calculating an average, I think in terms of accumulating an amount and counting before dividing. Victor took a different approach, and I like it.
Victor's RPG program, GETSTAT, has the following parameter list.
|
Parameter
|
Description
|
Direction
|
|
Amount
|
The next
item in the list of numbers
|
In
|
|
Count
|
The number
of items in the list of numbers
|
In/Out
|
|
Sum
|
The sum of
the items in the list of numbers
|
In/Out
|
|
Average
|
The
average of the items in the list of numbers
|
In/Out
|
|
Max
|
The
highest number in the list
|
In/Out
|
|
Min
|
The lowest
number in the list
|
In/Out
|
|
Deviation
|
The
standard deviation of the items in the list
|
In/Out
|
|
Reset
|
Y=Reset
the In/Out parameters to zero
|
In
|
Call this program for each number in the list. That is, if you need the average of seven numbers, call the program seven times. On the first call for the group, pass a value of Y through the RESET parameter in order to zero out all the aggregate values. On subsequent calls for the group, pass a RESET value of N.
* Example caller of GETSTAT
* reset accumlators on first call for the group
C move 'Y' @@reset
*
* loop to process all the numbers in the group
C dow
*
* load the next item in the list into @@parm at this point
C z-add @@parm
* call the calculation program.
C call 'GETSTAT'
C parm @@parm 11 2
C parm @@count 5 0
C parm @@sum 11 2
C parm @@average 11 2
C parm @@max 11 2
C parm @@min 11 2
C parm @@deviation 11 2
C parm @@reset 1
*
* don't reset accumlators after first call for the group
C move 'N' @@reset
C enddo
On each call, GETSTAT counts, accumulates, saves min and max, and recalculates average and standard deviation.
***** GETSTAT - calculate statistical values
C *entry plist
C parm @@parm 11 2
C parm @@count 5 0
C parm @@sum 11 2
C parm @@average 11 2
C parm @@max 11 2
C parm @@min 11 2
C parm @@deviation 11 2
C parm @@reset 1
*
* Clear fields on first record of a group.
C if @@reset='Y'
C z-add *zero @@count
C z-add *zero @@sum
C z-add *zero @@average
C z-add *zero @@max
C z-add *zero @@min
C z-add *zero @@deviation
C endif
*
* Count and accumulate
C eval @@count=@@count+1
C eval @@sum=@@sum+@@parm
*
* Recalculate average and standard deviation
C eval(h) @@average=(@@sum/@@count)
*
C eval(h) @@deviation=((@@count-1)*
C @@deviation*@@deviation+
C (@@parm-@@average)*(@@parm-svaverage))/
C @@count
C sqrt(h) @@deviation @@deviation
*
* Track lowest and highest numbers of the group
C if @@parm>@@max or @@count=1
C eval @@max=@@parm
C endif
*
C if @@parm<@@min or @@count=1
C eval @@min=@@parm
C endif
*
* Save @@average for use in standard deviation
* calculation on next call
C z-add @@average svaverage
C *like define @@average svaverage
*
C Return
After the call for the last item in the list, the In/Out parameters have the values of the group functions.
I like Victor's program. It's simple, easy to understand, and easy to use. I would think this program would work especially well in an interactive environment, where small lists of numbers are being manipulated.
I should also mention that SQL has these statistical functions. Here's an example that returns the same values that Victor's program calculates.
select item, sum(SomeNumber), count(*),
dec(avg(SomeNumber),11,2) as avg,
max(SomeNumber), min(SomeNumber),
dec(stddev(SomeNumber),11,2) as stddev
from somefile group by item
|