What are Computational Ethiopics
and
Ethiopic Information Processing?

Why The Time Is Now

If you have reached this document you are well aware of what the Internet has done for global text communications. The world wide explosion of the Internet has lead more and more to a need for communication software suitable for a multilingual market. Internationalization of software is one of the fastest growing areas of software research and development. With the August 12th acceptance of Ethiopic in the Unicode standard, and the October arrival of Internet to Ethiopia, developers world wide will likely start to turn their attention towards Ethiopic. There is indication that this has already begun.

It is important that Ethiopic become a part of existing projects in multilingualism and Internationalization. Accomplishing this requires a concerted effort to spread awareness of Ethiopic and to provide to developers, in consice terms, the requirements of Ethiopic text processing and to provide resources that simplify Ethiopic implementation. [See Also: ``From He To Po'']

Issues In Ethiopic Text Processing

The arrival of the Unicode standard for Ethiopic alleviates some of these issues. The Unicode standard for Ethiopic recognizes 346 characters though some softwares will use many more. An ``Extended Range'' for Ethiopic will be considered by Unicode in a later standard. The author identifies 436 elements used in the writing systems of Ethiopia and Eritrea, the practical majority of which are available from the Unicode standard.


A Glossary of Terms Used In CE

Agafari
Meaning literally ``one who serves''. Agafari is both a DOS like PC operating system and an MS Windows True Type font package developed and maketed by the ESTC.

ASCII
American Standard Code For Information Interchange. This is the 7-bit standard that encompasses the typable characters from a ``101'' keyboard.

ANSI
American National Standards Institute. The 8-bit standard that extends ASCII to include additional characters for Latin script used in Europe.

Binary
Base two. A number notation that uses two possible values, 0 or 1.

Bit
Binary digit. The basic units of memory that computers process.

Byte
Eight bits.

Bitmap Font
A font whose character shapes are defined by arrays of bits.

BDF
Bitmap Distribution Format. Adobe's format for distributing bitmap fonts -a lowest common denominator for font conversions.

Character
An abstract notion denoting a class of shapes declared to have the same meaning or form.

Character Code
The numeric code within an encoding method that is used to refer to a specific character. Character codes may be referred to by their octal, decimal, or hexadecimal values.

Character Set
A collection of characters.

Code Space
The space in which characters can be encoded according to the specifications of a given encoding method. Code positions outside the code space are considered invalid.

Code Point
See Character Code.

Code Position
The numeric code within an encoding method that is used to refer to a specific character. For two-byte characters, this referes to the row and the cell.

CS
Computer Science.

Computational Ethiopics
A branch of text and information processing that is specific to the needs of the languages and writing system termed ``Ethiopic''.

Decimal
Base 10. A number notation that uses 10 possible values, ranging from 0 to 9.

Diacritic Mark
A mark that serves to annotate characters with additional information, usually a variant pronunciation. Diacritic marks typically found above or below characters in the West such as á and ü.

DPI
Dots-per-inch. A measurement for device resolution.

Encoding
The correspondence between numerical character codes and the final printable glyphs. For instance, 0x41 is the ASCII code for the letter A. Under Unicode/ISO-10646 0x1200 is the encoding of he.

E10N, F10N, and G10N
Ethioization, Fidelization, and Ge'ezization respectively. Terms that may be applied to mean the localization of software to the Ethiopian and Eritrean markets. ``G10N'' may be preffered as ``F10N'' implies L10N of the writing system only. ``Ethio-'' is appropriate as it is applied very broadly, even beyond the languages and the writing system. ``Ge'ez-'' may lack word recognition in the western world but it is less encompassing and implies a root, an origin from which most things considered Ethiopic descend. G10N should be considered to mean ``localization for those languages whos' writing systems descend from Ge'ez script''. In general the three choices could be applied interchangeably without invoking confusion.

EAS
Ethiopian Authority for Standardization.

ELUX
Ethiopic Languages User Interface.

ESTC
Ethiopian Science and Technology Commision. The ESTC is known for its Agafari PC operating system and MS Windows font package.

ETA
Ethiopian Telecommunications Authority.

Ethio ASCII
Ethiopian ASCII (need info), see Haddish.

Ethiopic
An umbrella term that may refer to any aspect of the cultures of the peoples at some time residing in the political boundaries of Ethiopia. ``Ethiopic'' is the term most widely recognized used in the west to refer to the Ge'ez writing system and the languages unique to Ethiopia and Eritrea.

Ethiopic Hyphenation
The proper handling of Ethiopic characters at the begining and at the ends of lines. Punctuation such as : (wordspace) should not begin a new line.

EUC
Extended Unix Code. EUC provides a means for working with wide characters in the UNIX operating system. EUC for Ethiopic has not been addressed.

Fidel
Fidel is the general word used by Ethiopians to refer to the Ethiopic syllabary. Fidel also translates simply as ``alphabet' or ``a writing system''.

Font
A collection of characters of a single typeface and encoding system that computers can use with a specified device (such as a monitor or printer).

FontSet
A list of fonts of different scripts associated with the locale of a user's environment.

FSF
Free Software Foundation.

Ge'ez
Ge'ez may refer to either the Ge'ez people, language, or writing system. Modern Ethiopic descends from the Ge'ez syllabary where the Ge'ez is found as a 26x7 array subset. Ge'ez is also the liturgical language still in use in the Ethioipian Orthodox church.

GNU
Short for GNU is Not Unix. A series of UNIX-based software that is provided free of charge. GNU software (and other software that seeks protection) falls under the terms of the GNU General Public License, which protects software from being explointed for commercial use. It ensures that there will always be a large body of software freely available.

Haddis
A single byte encoding system described as ``The minimum required characters for adequate use as Ato Haddis Alemayehu's book Fiker Eske Mekabir.''

Hexadecimal
Base 16. A number notation that uses 16 possible values, 0-9 and A-F. The most common notation used in the computer world.
Hohet
The name for a single letter in the Ethiopic syllabary.

ISO
International Standards Organization.

ISO 10646-2 / ISO.UCS-2
The fixed width two-byte (16 bit) encoding method for ISO 10646. ISO 10646-2 is a subset of ISO 10646-4 a four-byte (32 bit) encoding method for ISO 10646. Unicode and ISO 10646-2 are identical encodings.

Information Interchange
The process of moving information from one hardware or software configuration to another with no loss of data.

Information Processing
The process of manipulating electronically encoded information at different levels. Ethiopic code and text processing are forms of information processing.

Internationalization (I18N)
The process of designing software (or hardware) in a flexible manner such that it becomes an easy task to adapt or localize to another country with differeent languages. Internationalization also makesit possible to use more than one writing system on computers. There are two main implementations of internationalization: the locale model and the multilingual model.

Input Method (IM)
The system of keystroke composition sequences required to specify the memebers of a writing system to an application.

IRC
Internet Relay Chat. A multi-user version of Talk for which an Ethiopic version exists.

Java

JIS
Japanese Industrial Standard. The name of the standards established by JISC (JIS Committee). Also the name of the encoding method used for JIS X 0208-1990 and JISX 0212-1990 character set standards. Ethiopic encoding under JIS has been applied for World Wide Web.

JUNET
Japan Unix Network. Ethiopic encoding under JUNET is used by Mule.

LaTeX
A popular macro package for use with TeX. LaTeX provides a simpler and more versatile interface to TeX. An EthioLaTeX package was developed by EthiO Systems.

LibEth
A libary of computer subroutines developed as a Computational Ethiopics programming resource.

Ligature
A character whose glyph consists of two or more characters fused together. An example found in older writing styles is the fusion of (g) and (zi). Oppurtunities abound to create Ethiopic ligatures.

Localization (L10N)
The process of adapting software (or hardware) such that it conforms to the expectations of a specific country. This often includes rewriting menus and dialogs into the target language, but sometimes involves more complex changes such as handling special character encoding methods. Other issues to be addressed are time zones, ways of writing dates and times, currency, and others.

Locale model
A model of internationlization that predefines many attributes that are languge or country specific, such as the maximum number of bytes per character, date formats, time formats, currency formats, and so on. The actual attributes are located in a library or locale object file that is loaded when required.

Logical Font Name
A naming scheme used by X Windows to uniquely identify fonts installed on the server. The naming scheme is composed of the font's name as well as its attributes.

Message Catalogs
A file containing text strings needed by a software. Message catalogs will open in correspondance with the user's environment locale. In this way message catalogs can provide different text and messages for a specified language and character encodings.

Meta Font
Ethiopic Meta Fonts were developed by EthiO Systems.

Mino-Sabæan
An ancient South Yemen people and writing system that immigrated to Eritrea and Northern Ethiopia later becoming the Ge'ez people and writing system.

Multilingual Model
A model of internationalization that uses a character set whose repertoire contains enough characters to represent most of the world's writing systems. No flipping between character sets is required. The Unicode character set is an example.

Mule
Multilingual Emacs

Multiple-Byte Character
A character that is represented by more than one byte. The C programming language supports multi-byte data types. See Also Wide Character

NCIC
Ethiopia's National Computer Information Center.

NLS
National Language Support.

Octal
Base 8. A number notation that uses 8 possible values, ranging from 0 to 7.

Octet
An array of eight bits represented as a single unit (a byte).

OCR
Optical Character Recognition. A device that can scan, recognize, and convert printed shapes into meaningful units, such as characters. OCR for Ethiopic has been successfully acomplished by Michal Jerabek michal.jerabek@ff.cuni.cz

Omron

Orthography
A linguistic term that refers to the writing system of a language.

OS
Operating System. The software that drives the hardware associated with a computer system.

OSF
Open Software Foundation.

PC
Personal Computer. Usually refers to machines that run MS-DOS.

PCF Fonts
Portable Compiled Format. Compiled fonts for X Windows to be read by an X server.

PK Fonts
The compiled bitmap form of Meta Fonts.

Plan 9
The next generation multilinual UNIX operating system developed by AT&T Bell Labs.

PostScript
The page description language developed by Adobe Systems.

QWERTY Array
The most common keyboard in use today. Its name comes from the first six keys that have 26 letters of the Alphabet imprinted on them.

Sabæan
See Mino-Sabæan.

SBCS
Single Byte Character Set.

SERA
System for Ethiopic Representation in ASCII.

Syllabary
A writing system whose characters are composed of syllables. Ethiopic is an example of a syllabary.

Syllable
A sound sequence consisting of a consonant plus vowel.

Talk
A utility to give real-time and interactive user to user communication in a tty. Users ``talk'' by typing and reading one anothers text in the upper and lower halfs of the tty. An now outdated version of Ethiopic Talk is still available.

TeX
A popular text setting language used in large part by scientists, mathematicians, and engineers. See Also LaTeX.

Transcription
The translation of text from one writing into another. Usually transcription follows simple reversable mappings. The transcribed text may read unnatural to native readers of the target script.

Transliteration
The translation of text from one writing into another where the writing conventions of the target writing system are applied. The transliterated text should read naturally in the target script.

Typeface
A distinctive design for a set of visually related symbols. Examples include Helvetica, Garamond, and Zemen.

Unicode
The name of the international 16-bit character set and encoding system developed by the members of the Unicode Consortium. Ethiopic has been included in Unicode since August 12th 1996.

UCS
Universal Character Set.

UNIX
The name of the operating system that runs on most workstations.

UTF
UCS Transformation Format. A method of encoding 16- or 32-bit encodings such that they pass as a stream of ASCII bytes. Also called UTF-1.

UTF-2
A version of UTF defined by AT&T Bell Labs (Plan 9) and X/Open for encoding Unicode text as a stream of ASCII bytes. Also called FSS-UTF (file system safe UTF).

WashRa
An MS Windows font package first in applying Ethio ASCII. The name ``WashRa'' is taken from the Monestary in Gonder reknowned for its longevity, discipline, and Qinay.

Word Processor
A text processing tool that manipulates text in such a way that it is possible to include multiple fonts in a single document. Sufficient formatting capabilities are also quite common.

Whitespace
Characters that produce empty space, such as the space, characer or the tab character.

Wide Character
A character that consists of a larger than normal byte. A byte typically consists of seven or eight bits. A character represented by 16 bits is considered a wide character. A byte can encode can encode 28 characters (or 27 for 7-bit bytes). Ethiopic requires The C programming language supports wide character data types.

Wordspace
A generic term for a character used to separate two words. Ethiopic wordspace resembles the English colon.

WYSIWYG
What You See Is What You Get.

XCCS
Xerox Character Code Standard.

X Window System
The name of a very popular UNIX windowing system developed at MIT. The latest release is called X11R6.

Zemen
Meaning literally an ``era'' or period of time. The standard font class used in most modern Ethiopic publishing. Popularized by Ethiopia's ``Addis Zemen'' newspaper from which it gets its namesake.


<==Back To ``From He To Po''


<== Return to ACG Homepage