Problematic Characters of the Slavonic Unicode Range

At this point I feel that the Unicode Standard is so flawed in many ways because of all the influence from the “Balkan academics” who introduced Slavonic characters following the standard Cyrillic range. The problem is that these characters were added without sufficient representation and feedback from Russian scholars, particularly those who adhere to the Orthodox Faith (who make up the majority of people who still use the language). Now we are saddled with lots of misnamed characters and misinformed descriptions connected with various characters.

1a/b) Base Characters – Upper and Lower Case Letters

[not accepted by Unicode] – CYRILLIC NARROW O – lower case version only; this character is essential in reproducing pre-Nikonian literature, where it is the unaccented form of the letter "O" in medial (non-initial and non-final) positions; it is also the first element in the CYRILLIC SMALL LETTER UK (U+0478/9) – *NOT* a standard round "O".

U+0478 / 0479 – CYRILLIC LETTER UK – deprecated by Unicode, but there is disagreement regarding this reform.

Diagraph Uk vs. Monograph Uk – An important point was brought up, viz. that Uk acts like two characters, not one. For example, only the first character is usually printed in red type. I believe that this is one more argument in favor of encoding it as two characters and not using the (deprecated) Diagraph Uk codepoint.

U+0476 / 0477 – IZHITSA WITH DOUBLE GRAVE ACCENT – the name is inaccurate; it should be called "Izhitsa with I Titlo"; in reality, it should be deprecated, but its use has already been established.
This is actually a digraph of Izhitsa with the “Izhe titlo”, not with a double grave accent. Any close examination of Slavonic texts will produce the conclusion that this is incorrect. The reason that the Izhe titlo is placed above the letter is to clearly indicate the pronunciation of the letter as “I”, instead of its alternate pronunciation as “V” (since this character is derived from Greek and follows similar orthographic rules). In truth, this character should be deprecated in favor of using the two separate characters.


2) Abbreviated Characters (letter titla and superscript letters)

U+A675 – COMBINING CYRILLIC LETTER I – The “izhe-titlo” is currently in the process of being considered for inclusion in the Unicode Standard (see Proposal n3748). Its inclusion is crucial in the publication of period literature from the Ustav and Poluustav eras. (It is also the upper element of the character “izhitsa with izhe-titlo” and the two hatch marks over the latter “YI” above.)

U+A67D – COMBINING CYRILLIC PAYEROK – This is the combining character “jerok”, which is generally not a combining character (or just barely combining, as the catalog demonstrates). This character is used in all eras of typography.

U+A67F – CYRILLIC PAYEROK – This is the combining character “paerok” (a rarely used early alternate form of the “jerok”). The “paerok” is not used in Synodal Era typography, but its Unicode space can be used for a combining placement of the “jerok”. (OR SHOULD THIS BE A NON-COMBINING FORM?)

Yerok vs. Payerok – It has been pointed out that Yerok and Payerok are two different things. See attached. The Combining Cyrillic Yerok (yerik) should be encoded U+033E, Combining Vertical Tilde, which we have done. We need to also include the non-combining version of this character as U+2E2F, Vertical Tilde. I'm not exactly sure what the function of the Payerok is, but we should not use A67D and A67F to encode the yerok (yerik).

3) Alternate Letter Forms (discretionary forms)


4) Ligatures


5) General Titla Glyphs (non-letter titla)

U+0483 – COMBINING CYRILLIC TITLO – The Unicode Standard has declared that this character should be used for typesetting numbers, not for word abbreviations. However, this is an unnecessary distinction and should be disregarded. In order for Unicode to be a practical tool for typographers, this character should be used whenever the “general titlo” (общое титло) in the form of a “bar” (взмет) is placed directly over a single letter.

Comment: not used with letter titlos
This is the symbol called "obshchoe titlo" (general titlo) in the traditional Church Slavonic Primers. The word is a borrowing from the Greek "τίτλος". While this glyph is presented solely as a combining mark which centers over lower case letters, traditional typography requires four separate positions (and thus four separate symbols to meet this need). Without the use of sophisticated combining character technology (which positions the combining character both horizontally and vertically according to pre-programmed anchors written into the coding of TrueType and OpenType fonts), correct and authentic typesetting cannot be accomplished. In order to obtain correct character spacing, this glyph must be a "zero-width character". These four positions are:
a) centered over a lower case letter
b) centered over an upper case letter (higher and slight more to the left)
c) centered over TWO lower case letters
d) centered over TWO upper case letters
In Old Church Slavonic, a SHORT titlo (the standard width) is used over a single letter or over the place of abbreviation; a LONG titlo is used over the whole word. The long titlo was mostly abandoned a few centuries before the introduction of printing, so it is debatable where or not the long titlo should be included in the full repertoire of Slavonic symbols (although it could be useful for reproducing manuscripts or for technical use in writing grammars of the Old Church Slavonic language). [Note: I'm not personally willing to take up the task of proposing it to the Unicode Consortium, but I will leave the matter in the hands of people who feel this is worth the effort.]

This is not used with the printed tradition of Church Slavonic, but is a symbol belonging solely to the manuscript tradition and in technical transcriptions (especially modern transliterations). (This is a zero-width character.)

U+0487 – COMBINING CYRILLIC POKRYTIE – This should generally not be used in conjunction with superscript letters, but primarily as a "stand-alone" character for use in demonstrating its appearance and use.

Comment: used only with letter titlos
This glyph is placed over a superscript letter in abbreviated words, and never appears solely by itself (except to demonstrate its form in text books). As it is presented in Unicode, the "solo pokrytie" (a name I propose for lack of a better existing description) is only useful as the forementioned demonstration symbol. The proper "letter titla" have finally been included in the latest version of the Unicode Standard (see below). (This is a zero-width character.)

U+A66F – COMBINING CYRILLIC VZMET – The “combining Cyrillic vzmet” looks identical to the previous character. The Unicode Standard has declared that this character should be used with Cyrillic letters and letter titlos to indicate abbreviation, and not with numbers. However, as with the previous character, this is an unnecessary distinction and should be disregarded. In order for Unicode to be a practical tool for typographers, this character should be used whenever the “general titlo” (общое титло) needs to be placed so that it balances over TWO letters.

Comment: used with Cyrillic letters and letter titlos to indicate abbreviation
Block: Cyrillic Extended-B (which, BTW, needs some rehabilitation!)

U+0305 – COMBINING OVERLINE – The “combining overline” can be used in Ustav fonts (only) to present the wide version of the combining titlo (vzmet) which is placed directly over two preceding letters. (It might be better not to use this character, but to use the following character instead.)

U+0360 – COMBINING DOUBLE TILDE – The “combining double tilde” can be used in Ustav fonts (only) to present the wide version of the combining titlo (vzmet) when it balances over TWO letters.

6) Double Titla


7) Diacritical Marks

The rough breathing was also used in the early Cyrillic alphabet when writing the Old Church Slavonic language. In this context it is encoded as Unicode U+0485 ( ◌҅ ) named “COMBINING CYRILLIC DASIA PNEUMATA”. It is an obsolete symbol in the later Church Slavonic language, and it appears that it was adandoned before the earliest printed editions. (This is a zero-width character.)

U+0486 ( ◌҆ ) “COMBINING CYRILLIC PSILI PNEUMATA” for the Old Church Slavonic language. This might also be used for the later "Church Slavonic" language, but the technical use is now slightly different (merely to indicate an initial vowel), and the shape is likewise slightly different (turned on its side, as if the symbol has been rotated 90 degrees anticlockwise). (This is a zero-width character.)

KAMORA – U+0484, Combining Cyrillic Palatilization, should not be used for the Kamora. Instead, this is a Palatalization mark that rests over two letters, indicating palatalization. It should NOT balance over the center of the letter. Instead, we should encode Kamora only at U+0311, Combining Inverted Breve. I will change this both in my table and in the CU keyboard layout. I noticed that we've also encoded the Kamora at U=0302, Combining Cicumflex Accent. I do not think that this is correct. The circumflex is a pointed accent, much like in French, not used in Cyrillic. I don't think this codepoint should be used for Kamora.

8) Punctuation and Other Typographical Symbols

U+003B – SEMICOLON [versus U+037E Greek Question Mark] – The Slavonic Question Mark

9) Number Symbols


10) Typicon and Rubrical Symbols

13) U+1F545 (?) – MARK CHAPTER SYMBOL – This glyph represents the Chapters of Mark the Monk or the “Marcian Chapters” (Марковы главы), which are included in the Menaia, Triodia and the Typicon. These chapters provide instructions for important feasts when they coincide with Sundays or other feasts or important commemorations. The symbol is placed in the margin to draw the eye to the reading. The symbol was not found in the Ustav period, as it was a later invention, but the opportunity may be used to include the similar “M-R” ligature in Ustav fonts, where it was used in the manuscript tradition to abbreviate the word “имярекъ” (“say the name here”). If Unicode refuses to include this glyph, we can use U+2CE5 COPTIC SYMBOL MI RHO as a substitute.]

11) Ornamental Symbols


12) Editorial Marks