|
|
![]() |
|
|
Exploring Data-Type Acronyms by Kevin Vandever We programmers generally don't worry too much about our data. Sure, we want our databases to contain valid data so our applications will run correctly, but most of us don't give a second thought to how that data is actually stored. Well, I am going to take a little departure from the normal how-to article and introduce you to some acronyms related to data types. You may have seen some of these acronyms before but never really understood what the heck they meant. You will never have to live with that feeling again.
CDRA (Character Data Representation Architecture) The CDRA is set by IBM in order to handle a minimum set of characters for cross-platform and cross-national-language support. CCSID (Coded Character Set ID) CCSID is probably the most common acronym you've seen during your daily adventures, but have you ever taken a moment to really understand what it means? CCSID provides cross-system and multinational support for managing character information using the CDRA (defined above). The CCSID is a 16-bit number that defines a specific collection of coding-related information that uniquely identifies a coded character set. For example, the CCSID for the English language, in the United States, is 37. Figure 1 shows the field definition for a physical file. You can see that the character fields defined in that file have all been defined with a CCSID of 37. The CDRA defines the CCSID values to identify classifications used to represent characters and to convert those characters, as needed, to preserve their meaning. DB2 tags character columns with CCSIDs (notice in Figure 1 that numeric data is not tagged with a CCSID), either explicitly, using a data structure definition, or implicitly, using the job or system. The data is not converted when it is sent to another system; rather, the receiving job converts the data to its own CCSID if it is different. If that receiving job were on a French language machine, the CCSID would be converted to 297; whereas a Greek-based PC would convert the CCSID to 4965. A CCSID that you will run into, even on an English-based machine, is 65535. A CCSID of 65535 indicates that the data is hex and will not be converted.
SBCS (Single-Byte Character Set) SBCS is a set of characters where each character is represented by a one-byte code. This is the default character set for the CHAR and VARCHAR data types. DBCS (Double-Byte Character Set DBCS is a set of characters where each character is represented by two bytes. Certain languages, such as Japanese, Chinese, and Korean, contain more symbols than can be represented by using a single-byte character set, so they require double-byte character sets. OS/400 supports four double-byte character sets: Japanese, Korean, Simplified Chinese, and Traditional Chinese. UCS-2 (Universal Multiple-Octet Coded Character Set-2) UCS-2 is a character set where each character is represented by two bytes. There are two CCSIDs on the iSeries used to represent UCS-2 characters. They are 61952, which is OS/400-specific, and 13488, which is used by Distributed Relational Database Architecture (DRDA) products, such as DB2 UDB for iSeries. UCS-2 allows you to store and retrieve data, in the user's national language of choice, in a single file. For example, my physical file in Figure 1 could contain descriptions in English, French, and Greek, and by defining these fields with a CCSID of 13488, I could satisfy the all three languages, regardless of the CCSID of the display device, because the data is stored with a unique set identifier, 13488. ASCII (American Standard Code for Information Interchange) ASCII is the standard 7-bit data-storage format for PC- and UNIX-based operating systems. EBCDIC (Extended Binary-Coded Decimal Interchange) EBCDIC is IBM's 8-bit data-representation format, and it is the standard format for data storage, regardless of the character set, on the Series. I'm not sure why IBM created its own in EBCDIC, but it was probably to give us programmers who access and transport data across different platforms an extra challenge. UNICODE UNICODE (which is not an acronym but an important cog in the data representation wheel) provides a unique number for every possible character, regardless of language, platform, or program. It has been adopted by many industry-leading vendors and is supported by all of the latest technologies, platforms, and programming languages. Impress Your Friends and Family OK, so it's not all that exciting, but hopefully you now have a better understanding of how data can be stored on the iSeries. You probably aren't going to run out and mess with different character sets and interchange types, but the next time one of these acronyms pops up in a manual or comes up at a cocktail party, you can engage in that text or conversation with confidence.
|
Editors
Contact the Editors |
|
Last Updated: 10/10/02 Copyright © 1996-2008 Guild Companies, Inc. All Rights Reserved. |