[NEXT] Single byte coded character sets
[PREVIOUS] Introduction
[CONTENTS]
[EESTI]


2 DEFINITIONS & ABBREVIATIONS

ASCII (American National Standard Code for Information Interchange). American national 7-bit character code standard for use in information interchange, data processing and communications systems. The ASCII code table includes control characters and graphic characters. The left part of the Estonian basic code table (Table 3.1) coincides with the ASCII code table.
AW-weight, Alphanumerical Weight. A weight assigned to letters and digits to serve as a basis in precision sorting algorithms for text comparison. As a first approximation, weights assigned to capital and small letters are equal.
AZERTY keyboard. A keyboard arrangement with the second row (row D) containing from left to right the letters A, Z, E, R, T, Y, U, I, O, P. The keyboard is used in French cultural environment.
Byte. A bit sequence consisting of 8 bits (binary digits) and representing a character. Instead of a 8-digit binary number, the corresponding decimal or hexadecimal numbers can be used. For instance, the bit sequence corresponding to the character Ü in the basic table is "11011100", the decimal number "220", the hexadecimal number "DC".
CGCSGID number, Code Graphic Character Set Global Identifier number. The number is obtained by connecting the GCSGID and CPGID. For instance, 00697 00850 (also, 697 850).
CPGID number, Code Page Global Identifier number. A number assigned to a code table that is registered in the IBM manual "National Language Support Reference Manual. Volume 2". The Estonian code tables are based on the following code tables: IBM CP 819 (basic table), IBM CP 850 (table for microcomputers), IBM CP 437 (code table presented in microcomputers by default), IBM CP 278 (EBCDIC table).
CW-weight, Case Weight. A weight assigned to a character case. Cases for capitals, small letters, indexes, etc are distinguished. The CW-weight is used in precision ordering of texts.
Diacritics. An additional sign in the composition of a character, e.g. diaresis.
DW-weight, Diacritic Weight. A weight used in precision sorting of texts containing diacritics.
EBCDIC, Extended Binary Coded Decimal Interchange Code. A code table family used in IBM mainframes. The Estonian EBCDIC code table is based on the IBM CP 278 table.
Special character. Graphic character that does not represent a letter, digit, or space. In the GCGID notation, the special character identifiers start with the letter "S". E.g. the GCGID of the character "%" is SM020000.
GCGID or GCID, Graphic Character Global Identifier. This identifier defines uniquely any character; it consists of two letters and six decimal digits (the zeros at the end of the identifier usually are deleted). E.g. the GCID number of the letter "a" is "LA010000" or "LA01".
GCSGID, Graphic Character Set Global Identifier. The identifier of a character set registered in the IBM manual "National Language Support Reference Manual. Volume 2".
Graphic character. Unlike a control character, a graphic character has a visual image, it can be written by hand, printed or displayed on a screen. Every graphic character has a code.
ISO, International Organization for Standardization.
Control character. A character that acts as a command for an activity (line feed, end of message, etc) and that does not possess a self-dependent graphic image. In the present standard, control characters are not included. Their use must be in accordance with international standards, which have the columns 0, 1, 8 and 9 reserved for control characters.
Language layer (on a keyboard). A keyboard may possess several language layers (e.g., Estonian and Russian layers). Every language layer has up to three cases (lower case, upper case, supplementary case).
Coded character set. Rules uniquely specifying characters and their codes.
Code. A single-byte value attributed to a character. In a specific character set, between characters and codes there exists a one-to-one correspondence.
Code table. Representation of a coded character set in the form of a table. A code table consists of 256 positions: 16 rows and 16 columns. The rows and columns are numbered with hexadecimal digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F. In the hexadecimal code of a character xy the digit x represents the column and y the row.
LATIN-1, Latin alphabet #1. A character set consisting of 190 characters used in West Europe, North, Middle and South America. The registration number of the character set in the IBM manual "National Language Support Reference Manual. Volume 2" is IBM CP 697. LATIN-1 is formed as the standard ISO 8859-1.
Ligature. A compound of two characters, e.g. "AE".
Character set. A complete set of characters. Any character set can be coded in several ways. The character set closest to the Estonian basic character set is GCSGID No 00697 consisting of 190 characters. This character set is represented in coded form as the EBCDIC code table IBM CP 278 and the LATIN-1 code table IBM CP 819.
Character. Element of a character set used for representing or ordering data and controlling data flows.
Notation. A way of describing a code combination. Let us mark the bits of a 8-bit code as b8, b7, b6, b5, b4, b3, b2, b1, where b8 is the highest and b1 the lowest bit. A bit sequence representing a character is presented by two hexadecimal numbers - xy, where the digit x represents the bit combination b8b7b6b5 and y the bit combination b4b3b2b1.
Position. Part of a code table identified by the coordinates of the row and column.
POSIX, Portable Operating System. A standard mobile operating system and its environment. POSIX may be looked at as a standardized UNIX.
QWERTY keyboard. Keyboard arrangement, containing in the second row (row D) from left to right the letters Q, W, E, R, T, Y, U, I, O, P. This keyboard dominates in cultures using the Latin alphabet (with exception of the French culture). The Estonian keyboard presented in this standard is a modification of the QWERTY keyboard.
QWERTZ keyboard. A keyboard arrangement containing in the second row (row D) from left to right the letters Q,W,E,R,T,Z,U,I,O,P. This keyboard has been in use in the German culture.
Case. Pressing any key of the keyboard inside a language layer, up to three different characters can be generated depending on the keyboard being in the lower, upper or supplementary case.
Nonescaping/dead key. A key on the keyboard, with does not change the position of the cursor. A nonescaping key can be used to generate characters with diacritics.
SH-weight, Shared Weight. The characters of a code table are divided into equivalence classes. Characters of every class have the same SH-weight. E.g., in the Estonian cultural environment according to the present standard the letters i, I, í, Í, ì, Ì, î, Î, ï and Ï have the SH-weight 78.
Sort key. A means to sort character sequences. For sorting purposes, the text is transformed into weight sequences. In the case of a text key, the first sequence consists of AW-weights, the next sequence is formed from DW-weights (weights characterizing diacritics), etc. The order of texts is determined comparing their sort keys.
Capslock, capitals lock. A function of a keyboard key, that switches the keyboard driver to the capitals case, but does not influence keys with digits and other characters. The capitals lock is a case key: by pressing it, the corresponding function is switched on or off.
SW-weight, Special Weight. A weight assigned to special characters. Used in precision sorting algorithms.

[NEXT] Single byte coded character sets
[PREVIOUS] Introduction
[CONTENTS]