This page describes how L2 handles Phrasal Semantic Units, or glossary entries which consist of more than one word.
PSUs are phrases that warrant their own dictionary entry, or a specific listing under one of the headwords. Such phrases may have their own meaning which is distinct from the sum of the parts, or they may simply be idiomatic usages which it is interesting to include in the glossary.
Although it is easy to be confused by the overlap caused by the fact that many PSUs are written using COFs, the two are completely separate. As far as L2 is concerned, PSUs are simply sequences of lemmata which are checked to ensure that the individual components match known criteria. COFs, on the other hand, are purely a feature of the writing interface, and have nothing to do with the interpretation of lemmata.
PSUs are lemmatized simply by lemmatizing the individual components: a special layer of L2 is responsible for identifying the phrases and linking the words together. Note the following considerations:
When lemmatizing the words it is not generally necessary to give a SENSE. However, within the L2 system the constituent words of PSUs are always associated with some SENSE of the word. As a result, when the GW of a word does not match a SENSE, it is better to give some keywords of the SENSE when lemmatizing. Also, some compounds and idioms may use different SENSEs of a word; in this case, too, it is necessary to give a SENSE when lemmatizing.
Sometimes a sequence of words should not be treated as a PSU even
though the sequence is listed in the glossary as such. To prevent the
lemmatizer treating the sequence as a PSU, use !
before
the first lemma of a PSU to show that it is NOT a PSU in this
instance. For instance, if ana[to]PRP; ṭarsi[extent]N
is
in your glossary with the meaning "opposite", write !ana[to]PRP;
ṭarsi[extent]N
when you want this phrase to keep its literal
meaning, "to an extent". The mnemonic is that !
is a
common boolean operator for NOT
: the !
tells
L2 *not* to process the word as part of a PSU.
You can use - before a lemma to omit a word (usually a MOD or AV)
from the middle of a PSU, e.g., libbašu[interior]N; -ul[not]MOD;
iṭâb[be(come) good]V
for libba ṭiābu [be(come) satisfied]
V
.
You can specify the sense of an idiom, in the GW of the first
Akkadian element, like this: ŠA₃.HUL = +lumnu[evil+=eclipsed
state]N$lumun&+libbu[interior]N$libbi,
where lumun
libbi has the GW "sorrow" but in some (mostly astronomical)
contexts means "eclipsed state".
In the glossary, a PSU has its own @entry
, and in
addition each of its constituents must have their own
@entry
. Each constituent must have all of the
information, including proper @form
lines, as any other
word: the constituent entries are ignorant of the fact that they are
later gathered into PSUs.
A PSU @entry
has one additional line relative to other
words: a @parts
specification, which gives the sequence of
consituents which makes up the PSU:
@entry ēkal māšarti [review palace] N @parts ēkallu[palace]N māšartu[inspection]N ... @sense N review palace @end entry
As with lemmatization, the constituents do not need an explicit SENSE to be given unless there are multiple PSUs with the same sequence of words but which differ in the SENSE of the one or more of the constituents, or if the GW does not match any of the SENSEs of the constituent. See 'Diagnostics' below for examples.
Sometimes the constituents of a PSU may be written in more than one
order. In such cases, the glossary simply needs multiple
@part
lines:
@entry ina pān dagālu [wait for someone] V @parts ina[in]PRP pānu[front]N dagālu[see]V @parts dagālu[see]V ina[in]PRP pānu[front]N
The @form
lines of PSUs also have some special
characteristics. One is that the first element of a @form
,
the written form, may contain multiple words: in this case they are
joined by underscores. The other is that they may only contain NORM
entries in addition to the written form, and each of the NORMs is
prefixed by its own $
-sign:
@form ina_pa-ni-šu₂-nu_a-da-gal $ina $pānišunu $adaggal @form ina_pa-ni-šu₂-nu_i-da-gal $ina $pānišunu $idaggal
COFs in the written forms of PSUs are straightforward when the entire PSU is written with a single COF:
@form im-muh-hi $ina $muhhi
When the writing mixes a COF with other constituents, however, it
is necessary to tell L2 how many of the NORMs of the
@form
line are used up by the COF. This is done by
adding the special sequence _0
(underscore followed by
the digit zero) for each COF-constituent after the first:
@form {na₄}NIR₂.PA_0_iṣ-ṣu-ri $hulāl $kappi $iṣṣūri
This diagnostic is generated when processing a @form
line which contains COF components indicated with parentheses. The
diagnostic gives the line number of the COF @form
which
is being processed, and indicates the spelling and an expected
normalization which has not been found. Since the COF handling is not
tied to entries, but to spelling and normalization combinations, the
diagnostic cannot tell you which @entry
it expected to
find the component in.
Here is an actual example:
00lib/akk-x-neoass.glo:3167: (g2a) PSU component #2 i-da-a-ti=dāt[behind]PRP$dāti not found in glossary
This tells you that at line 3167 in 00lib/akk-x-neoass.glo, a
defective @form
line is being processed in a PSU
@entry
.
The defect may be in any of several places. When processing a
@form
line, L2 takes each PSU component in turn, and
combines the written form and the normalization from the
@form
line with the signature data from the relevant word
in the @parts
line. The number given in the error
message tells you the component of the form/parts lines it is currently
working on.
If more than one @parts
line is given in the PSU
@entry
, L2 tries all of the @parts
lines to
find a complete set of matches before it reports errors.
To debug this error, visit the offending line in the glossary and look at the contexts and the individual word entries. Some common causes of this error are:
@form
line with its components out of order@form
line@part
line is
wrongIn the example error, the current word is
dāt[behind]PRP
Unless there is a bug in L2, you will find
that the expected form line is missing, in this case:
@form i-da-ti $(ina) $dāti
Assuming that there is no typo and that the @form
entry really should be there, you can now fix it by simply editing the
glossary to add the @form
line.
Steve Tinney
Steve Tinney, 'PSUs: Phrasal Semantic Units', Oracc: The Open Richly Annotated Cuneiform Corpus, Oracc, 2019 [http://oracc.museum.upenn.edu/doc/help/glossaries/psus/]