This page describes how the lemmatiser handles Compound Orthographic Forms, or written words which contain more than one lemma.
COFs are forms that are written as one word but in fact should be understood as more than one lemma. Their meanings are transparent and they do not justify a separate entry in the glossary.
COFs are lemmatized by giving the individual lemmata joined by
ampersands (&
) with no whitespace or semi-colons.
Examples in Akkadian include:
Linking two crasis words, e.g., ip-pan = ina[in]PRP&pān[front]N
, "in front of".
Lemmatizing compound logograms e.g., BAR.RA.NA =
+ina[in//on]PRP$&+ahu[arm//side]N$ahišu
, "on his side".
Although the entire COF does not get its own glossary entry, every
word in a COF must contain a @form
entry for the
spelling. In this @form
entry, the normalization which
corresponds to the glossary @entry
is called the primary
component, and is given without parentheses. The other words, the
secondary components, have their appropriate normalization given in
parentheses:
@entry ina [in] PRP @form ip-pa-ni-šu₂ $ina $(pānišu) @form ip-pa-ni-šu₂-nu $ina $(pānišunu) @form ... @sense PRP ina @sense ... @end entry ... entry pānu [front] N @form ip-pa-ni-šu₂ $(ina) $pānišu @form ip-pa-ni-šu₂-nu $(ina) $pānišunu @form ... @sense N front @end entry
COFs may contain more than two elements, and the same principles apply.
COFs require a little extra care in Sumerian, because Sumerian
@form
entries also have BASE, MORPH and perhaps other
additional fields. Further, Sumerian glossaries do not contain NORM
fields directly, because they are computed from the MORPH.
For simplicity of parsing, the rules with Sumerian COFs are as follows:
~
(tilde) component with the citation
form;@form
line must be given
after the last NORM entry in the line, even if that entry is a
secondary one (i.e., one which is enclosed in parentheses);@bases
line.Here is a real life example (edited for clarity), in which the sign
ušu₂
is used to write u₄ šu₂
:
@entry šuš [cover] V/t @bases suš₂; u°šu₂ @form ušu₂ $(ud) $šuš /u°šu₂ #~ @sense V/t to cover, to spread over @end entry @entry ud [sun] N @bases ud; u·šu₂ @form ušu₂ $ud $(šuš) /u·šu₂ #~ @sense N sun @end entry
COFs whose components are in different glossaries only occur on
Oracc when a COF combines a common noun with a proper noun. No
special treatment is needed for this case, but the relevant
@form
lines do have to be present for each component just
as they do when the components are in the same glossary.
Thus, in akk.glo
one would have:
@entry kabūtu [(animal) dung] N @form {na₄}ŠURIM.{d}GU₄ $kabūt $(Šeriš) ...
And in the proper noun glossary, qpn.glo
, the relevant
lines under Šeriš is:
@entry Šeriš [1] DN @form {na₄}ŠURIM.{d}GU₄ $(kabūt) $Šeriš ...
This diagnostic is generated when processing a @form
line which contains COF components indicated with parentheses. The
diagnostic gives the line number of the COF @form
which
is being processed, and indicates the spelling and an expected
normalization which has not been found. Since the COF handling is not
tied to entries, but to spelling and normalization combinations, the
diagnostic cannot tell you which @entry
it expected to
find the component in.
Here is an actual example:
00lib/akk-x-neoass.glo:3114: `i-da-a-ti=dāti': unknown COF component
This tells you that at line 3114 in 00lib/akk-x-neoass.glo, a COF
occurs which has a component $(dāti)
that has not been
found. Sometimes these may be typos, and sometimes they are missing
@form
lines.
To debug such errors, visit the offending line in the glossary and
look at the context. That usually makes it obvious which word the
error applies to--in this case dāt[behind]PRP
. Then look
at the @entry
for the word. Unless there is a bug in L2,
you will find that the expected form line is missing, in this
case:
@form i-da-ti $(ina) $dāti
Assuming that there is no typo and that the @form
entry really should be there, you can now fix it by simply editing the
glossary to add the @form
line.
Steve Tinney
Steve Tinney, 'COFs: Compound Orthographic Forms', Oracc: The Open Richly Annotated Cuneiform Corpus, Oracc, 2019 [http://oracc.museum.upenn.edu/doc/help/glossaries/cofs/]