This page describes an experimental implementation of Unicode Ideographic Variation Sequences as defined in Unicode Technical Report 37 (referred to as TR37 in the following). [https://unicode.org/reports/tr37] in OSL and Oracc fonts. The original idea of using these with Unicode cuneiform is owed to Robin LeRoy.
Unicode IVSs provide a means for selecting glyph variants in a standardized way. They work as a character pair in which the second character is a selector for a glyph variant of the first. The selectors are in the range U+E0100-U+E01EF.
The Unicode Ideographic Variation Database, IVD, as defined in TR37 is very simple: a collection must be registered with basic metadata. A complementary file defines sequences in the collection consisting of pairs of base characters and variation selectors along with a collection name and a name for the sequence.
The name of the Oracc OSL collection is
Oracc_OSL
. This implementation is presently
unregistered, but the collection data would be of the form:
Oracc_OSL;[A-Z]+[0-9]*(?:[@%&*.][A-Z]+[0-9]*)*_[A-Z]+[0-9]*(?:[@%&*.][A-Z]+[0-9]*)*;http://oracc.org/osl/ivs/
Oracc_OSL identifiers consist of a BASESIGN, underscore
character, '_', and VARDATA. Both BASESIGN, the sign subject to
variation, and VARDATA, the variation, consist of uppercase
letters A-Z followed by optional digits 0-9, followed by
optional compound sign parts. A compound sign part consists of
a delimiter from the set @%&*.
plus a sign
name, consisting of uppercase letters A-Z followed by optional
digits 0-9.
The BASESIGN is always derived from an OSL sign name
translating subscript digits to regular digits and substituting
SH for Š, e.g., MU
, NI2
, or
SHU
. For compound signs the sign name is
simplified by omitting vertical bars and parentheses and mapping
any TIMES sign to '*', e.g., |GA₂×AN|
would be
specified as GA2*AN
.
The VARDATA is either an OSL sign name or a descriptive label for the variant selected via the sequence as described further in the next section.
A table of Oracc_OSL reference glyphs will be maintained at XXX/osl/ivsglyphs.html.
Oracc_OSL defines IVSs for two reasons to support research annotation of cuneiform texts where it is important to encode both the base sign and its paleographic variant forms.
An example would be that a variation of the MU sign, mostly associated with ED Adab, is to form the SHE-style component of the sign with a KASKAL-style component instead:
# MU sign with KASKAL-style replacement for normal SHE-style component 1222C E0100;Oracc_OSL;MU_KASKAL
In the case of variant forms, The VARDATA component of the
label MU_KASKAL
may qualify the variation rather
than defining the target form as a merged sign.
The Oracc_OSL Ideographic Variation Database, OIVD, is kept in 00etc/Oracc_OSL.txt in the Oracc OSL repo, and is available online at XXX.
The OIVD contains all possible applications of IVS in Oracc_OSL; not all script phases will exhibit all IVS traits, however.
In Oracc cuneiform fonts IVS sequences are treated as OpenType LIGA entries [/osl/OraccCuneiformFonts#h_ligaturesliga].
In Oracc's transliteration system IVS sequences will be selectable using OSL Graphetics tags. [/osl/OraccCuneiformFonts/OSLGraphetics].