Validate DBCS-Open Data
August 19, 2009 Michael Sansoterra
The DBCS open data type is used in many legacy applications as a way to support storage of either single byte character set (SBCS) data, double byte character set (DBCS) data, or even data from both character sets within a single data column. Defined with a data type of O within DDS specs, this data type carries an inherent agnostic coded character set identifier (CCSID) of 65535.
The DBCS open data type operates by storing single byte characters, as traditional single byte character columns do. However, when a DBCS open column stores double byte characters, it uses special embedded escape characters to indicate when DBCS data starts and ends within the data. Within an open DBCS character string, a hex value of X’0E’ (decimal 14) signifies a “shift out,” or in other words, “the following data should be interpreted as double byte.” All character data encountered after the shift out character will be read in two byte pairs.
When the double byte data has ended and the remaining data should be interpreted as single byte again, a hex value of X’0F’ (decimal 15) signifies a “shift in,” or in other words, “we’re back to interpreting single byte character data again.”
However, under various circumstances, these DBCS shift-in and shift-out delimiters can become mismatched (usually a final “shift-in” is omitted) and can cause the data to be displayed or interpreted incorrectly.
Detecting these data errors within a DBCS open column can be difficult. To assist with this problem, presented below is a user-defined function named IsDBCSOpenValid that can be used to detect a mismatch between shift-in and shift-out characters within DBCS open data columns. The function simply checks to make sure there are an equal number of shift-in and shift-out characters within the data and that the shift characters are in the proper sequence.
If the number of shift-in and shift-out characters are equal and ordered correctly, the data is valid and the function returns a ‘Y’ to indicate the data is OK. If the shift-in characters are not equal or out of order the function returns an ‘N’.
Create Function IsDBCSOpenValid (DBCSOpen VarChar(512) CCSID 65535) Returns Char(1) Language SQL Deterministic Set Option Commit=*None Begin Declare i Int Not Null Default 1; Declare ShiftOut Int Not Null Default 0; Declare ShiftIn Int Not Null Default 0; Declare TestChar Char(1) Not Null Default ' '; While (i<=Length(DBCSOpen)) Do Set TestChar=Substr(DBCSOpen,i,1); If TestChar=X'0E' Then Set ShiftOut=ShiftOut+1; If ShiftOut-ShiftIn>1 Then Return 'N'; End If; ElseIf TestChar=X'0F' Then Set ShiftIn=ShiftIn+1; If ShiftIn>ShiftOut Then Return 'N'; End If; End If; Set i=i+1; End While; If ShiftIn=ShiftOut Then Return 'Y'; End If; Return 'N'; End;
The UDF can be used in a query as follows to test for invalid data in a table that uses one or more open DBCS columns. The Where clause specifies that only rows containing invalid “DBCS open” data will be returned.
Select * From ( Select RRN(A) As Record_No, My_Key_Field, DBCSData1, IsDBCSOpenValid(DBCSData1) As Test1, DBCSData2, IsDBCSOpenValid(DBCSData2) As Test2 From MyDBCSOpenTable A) InnerSelect Where Test1='N ' Or Test2='N'
For the record, if you’re creating a new application and want to store either single byte or double byte data in the same column, use an eight-bit (if the data will be predominantly SBCS) or 16-bit Unicode columns instead of using the antiquated DBCS open data type (which is definable in DDS only).
In V5R4 Unicode columns are defined as follows:
MyData CHAR(30) CCSID 1208 /* UTF8 - Single Byte Unicode */ MyData GRAPHIC(30) CCSID 1200 /* UTF16 - Double Byte Unicode */
In V6R1, the NCHAR and NVARCHAR data types can be used in place of the GRAPHIC and VARGRAPHIC data types:
MyData NCHAR(30) CCSID 1200 /* UTF16 - Double Byte Unicode */
Michael Sansoterra is a programmer/analyst for i3 Business Solutions, an IT services firm based in Grand Rapids, Michigan. Send your questions or comments to Mike via the IT Jungle Contact page.