COFs: Compound Orthographic Forms

This page describes how the lemmatiser handles Compound Orthographic Forms, or written words which contain more than one lemma.

Introduction

COFs are forms that are written as one word but in fact should be understood as more than one lemma. Their meanings are transparent and they do not justify a separate entry in the glossary.

Lemmatizing

COFs are lemmatized by giving the individual lemmata joined by ampersands (&) with no whitespace or semi-colons. Examples in Akkadian include:

Linking two crasis words, e.g., ip-pan = ina[in]PRP&pān[front]N, "in front of".
Lemmatizing compound logograms e.g., BAR.RA.NA = +ina[in//on]PRP$&+ahu[arm//side]N$ahišu, "on his side".

Glossarizing

Although the entire COF does not get its own glossary entry, every word in a COF must contain a @form entry for the spelling. In this @form entry, the normalization which corresponds to the glossary @entry is called the primary component, and is given without parentheses. The other words, the secondary components, have their appropriate normalization given in parentheses:

@entry ina [in] PRP
@form ip-pa-ni-šu₂ $ina $(pānišu)
@form ip-pa-ni-šu₂-nu $ina $(pānišunu)
@form ...
@sense PRP ina
@sense ...
@end entry

...

entry pānu [front] N
@form ip-pa-ni-šu₂ $(ina) $pānišu
@form ip-pa-ni-šu₂-nu $(ina) $pānišunu
@form ...
@sense N front
@end entry

COFs may contain more than two elements, and the same principles apply.

Sumerian COFs

COFs require a little extra care in Sumerian, because Sumerian @form entries also have BASE, MORPH and perhaps other additional fields. Further, Sumerian glossaries do not contain NORM fields directly, because they are computed from the MORPH.

For simplicity of parsing, the rules with Sumerian COFs are as follows:

NORM must be given explicitly; simply use the value of MORPH and replace the ~ (tilde) component with the citation form;
all NORMs must be given consecutively, at the start of the @form, but after any LANG field (as occurs in Sumerian entries in QPN glossaries);
all other fields in the @form line must be given after the last NORM entry in the line, even if that entry is a secondary one (i.e., one which is enclosed in parentheses);
BASE should be specified using the centre-dot to indicate that what precedes the centre-dot is part of the base, and the degree symbol to indicate that what follows the degree symbol is part of the base. The two symbols may be combined where several words are written using a COF. The BASE should be included in the @bases line.

Here is a real life example (edited for clarity), in which the sign ušu₂ is used to write u₄ šu₂:

@entry šuš [cover] V/t
@bases suš₂; u°šu₂
@form   ušu₂ $(ud) $šuš /u°šu₂ #~
@sense V/t to cover, to spread over
@end entry

@entry ud [sun] N
@bases ud; u·šu₂
@form   ušu₂ $ud $(šuš) /u·šu₂ #~
@sense N sun
@end entry

Cross-glossary COFs

COFs whose components are in different glossaries only occur on Oracc when a COF combines a common noun with a proper noun. No special treatment is needed for this case, but the relevant @form lines do have to be present for each component just as they do when the components are in the same glossary.

Thus, in akk.glo one would have:

@entry kabūtu [(animal) dung] N
@form {na₄}ŠURIM.{d}GU₄ $kabūt $(Šeriš)
...

And in the proper noun glossary, qpn.glo, the relevant lines under Šeriš is:

@entry Šeriš [1] DN
@form {na₄}ŠURIM.{d}GU₄ $(kabūt) $Šeriš
...

L2 Diagnostics

unknown COF component

This diagnostic is generated when processing a @form line which contains COF components indicated with parentheses. The diagnostic gives the line number of the COF @form which is being processed, and indicates the spelling and an expected normalization which has not been found. Since the COF handling is not tied to entries, but to spelling and normalization combinations, the diagnostic cannot tell you which @entry it expected to find the component in.

Here is an actual example:

00lib/akk-x-neoass.glo:3114: `i-da-a-ti=dāti': unknown COF component

This tells you that at line 3114 in 00lib/akk-x-neoass.glo, a COF occurs which has a component $(dāti) that has not been found. Sometimes these may be typos, and sometimes they are missing @form lines.

To debug such errors, visit the offending line in the glossary and look at the context. That usually makes it obvious which word the error applies to--in this case dāt[behind]PRP. Then look at the @entry for the word. Unless there is a bug in L2, you will find that the expected form line is missing, in this case:

@form i-da-ti $(ina) $dāti

Assuming that there is no typo and that the @form entry really should be there, you can now fix it by simply editing the glossary to add the @form line.

18 Dec 2019 osc at oracc dot org

Steve Tinney

Steve Tinney, 'COFs: Compound Orthographic Forms', Oracc: The Open Richly Annotated Cuneiform Corpus, Oracc, 2019 [http://oracc.museum.upenn.edu/doc/help/glossaries/cofs/]