[NEXT] Single byte coded character sets
[PREVIOUS] Introduction
[CONTENTS]
[EESTI]
2 DEFINITIONS & ABBREVIATIONS
- ASCII (American National Standard Code for Information
Interchange). American national 7-bit character code standard for
use in information interchange, data processing and communications
systems. The ASCII code table includes control characters and graphic
characters. The left part of the Estonian basic code table (Table 3.1) coincides with the ASCII code
table.
- AW-weight, Alphanumerical Weight. A weight assigned to
letters and digits to serve as a basis in precision sorting algorithms for
text comparison. As a first approximation, weights assigned to capital and
small letters are equal.
- AZERTY keyboard. A keyboard arrangement with the second
row (row D) containing from left to right the letters A, Z, E, R, T, Y, U,
I, O, P. The keyboard is used in French cultural environment.
- Byte. A bit sequence consisting of 8 bits (binary digits)
and representing a character. Instead of a 8-digit binary number, the
corresponding decimal or hexadecimal numbers can be used. For instance,
the bit sequence corresponding to the character Ü in the basic table
is "11011100", the decimal number "220", the
hexadecimal number "DC".
- CGCSGID number, Code Graphic Character Set Global Identifier
number. The number is obtained by connecting the GCSGID and
CPGID. For instance, 00697 00850 (also, 697 850).
- CPGID number, Code Page Global Identifier number. A
number assigned to a code table that is registered in the IBM manual
"National Language Support Reference Manual. Volume
2". The Estonian code tables are based on the following code
tables: IBM CP 819 (basic table), IBM CP 850 (table for microcomputers),
IBM CP 437 (code table presented in microcomputers by default), IBM CP 278
(EBCDIC table).
- CW-weight, Case Weight. A weight assigned to a character
case. Cases for capitals, small letters, indexes, etc are distinguished.
The CW-weight is used in precision ordering of texts.
- Diacritics. An additional sign in the composition of a
character, e.g. diaresis.
- DW-weight, Diacritic Weight. A weight used in precision
sorting of texts containing diacritics.
- EBCDIC, Extended Binary Coded Decimal Interchange Code. A
code table family used in IBM mainframes. The Estonian EBCDIC code table
is based on the IBM CP 278 table.
- Special character. Graphic character that does not
represent a letter, digit, or space. In the GCGID notation, the special
character identifiers start with the letter "S". E.g. the GCGID
of the character "%" is SM020000.
- GCGID or GCID, Graphic Character Global Identifier. This
identifier defines uniquely any character; it consists of two letters and
six decimal digits (the zeros at the end of the identifier usually are
deleted). E.g. the GCID number of the letter "a" is
"LA010000" or "LA01".
- GCSGID, Graphic Character Set Global Identifier. The
identifier of a character set registered in the IBM manual
"National Language Support Reference Manual. Volume
2".
- Graphic character. Unlike a control character, a graphic
character has a visual image, it can be written by hand, printed or
displayed on a screen. Every graphic character has a code.
- ISO, International Organization for Standardization.
- Control character. A character that acts as a command for
an activity (line feed, end of message, etc) and that does not possess a
self-dependent graphic image. In the present standard, control characters
are not included. Their use must be in accordance with international
standards, which have the columns 0, 1, 8 and 9 reserved for control
characters.
- Language layer (on a keyboard). A keyboard may possess
several language layers (e.g., Estonian and Russian layers). Every
language layer has up to three cases (lower case, upper case,
supplementary case).
- Coded character set. Rules uniquely specifying characters
and their codes.
- Code. A single-byte value attributed to a character. In a
specific character set, between characters and codes there exists a
one-to-one correspondence.
- Code table. Representation of a coded character set in
the form of a table. A code table consists of 256 positions: 16 rows and
16 columns. The rows and columns are numbered with hexadecimal digits: 0,
1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F. In the hexadecimal code of a
character xy the digit x represents the column and y the row.
- LATIN-1, Latin alphabet #1. A character set consisting of
190 characters used in West Europe, North, Middle and South America. The
registration number of the character set in the IBM manual
"National Language Support Reference Manual. Volume
2" is IBM CP 697. LATIN-1 is formed as the standard ISO
8859-1.
- Ligature. A compound of two characters, e.g.
"AE".
- Character set. A complete set of characters. Any
character set can be coded in several ways. The character set closest to
the Estonian basic character set is GCSGID No 00697 consisting of 190
characters. This character set is represented in coded form as the EBCDIC
code table IBM CP 278 and the LATIN-1 code table IBM CP 819.
- Character. Element of a character set used for
representing or ordering data and controlling data flows.
- Notation. A way of describing a code combination. Let us
mark the bits of a 8-bit code as b8, b7, b6, b5, b4, b3, b2, b1, where b8
is the highest and b1 the lowest bit. A bit sequence representing a
character is presented by two hexadecimal numbers - xy, where the digit x
represents the bit combination b8b7b6b5 and y the bit combination
b4b3b2b1.
- Position. Part of a code table identified by the
coordinates of the row and column.
- POSIX, Portable Operating System. A standard mobile
operating system and its environment. POSIX may be looked at as a
standardized UNIX.
- QWERTY keyboard. Keyboard arrangement, containing in the
second row (row D) from left to right the letters Q, W, E, R, T, Y, U, I,
O, P. This keyboard dominates in cultures using the Latin alphabet (with
exception of the French culture). The Estonian keyboard presented in this
standard is a modification of the QWERTY keyboard.
- QWERTZ keyboard. A keyboard arrangement containing in the
second row (row D) from left to right the letters Q,W,E,R,T,Z,U,I,O,P.
This keyboard has been in use in the German culture.
- Case. Pressing any key of the keyboard inside a language
layer, up to three different characters can be generated depending on the
keyboard being in the lower, upper or supplementary case.
- Nonescaping/dead key. A key on the keyboard, with does
not change the position of the cursor. A nonescaping key can be used to
generate characters with diacritics.
- SH-weight, Shared Weight. The characters of a code table
are divided into equivalence classes. Characters of every class have the
same SH-weight. E.g., in the Estonian cultural environment according to
the present standard the letters i, I, í, Í, ì,
Ì, î, Î, ï and Ï have the SH-weight 78.
- Sort key. A means to sort character sequences. For
sorting purposes, the text is transformed into weight sequences. In the
case of a text key, the first sequence consists of AW-weights, the next
sequence is formed from DW-weights (weights characterizing diacritics),
etc. The order of texts is determined comparing their sort keys.
- Capslock, capitals lock. A function of a keyboard key,
that switches the keyboard driver to the capitals case, but does not
influence keys with digits and other characters. The capitals lock is a
case key: by pressing it, the corresponding function is switched on or
off.
- SW-weight, Special Weight. A weight assigned to special
characters. Used in precision sorting algorithms.
[NEXT] Single byte coded character sets
[PREVIOUS] Introduction
[CONTENTS]