Signatures are the string representation of an instance's
lemmatization data and consists of a sequence of fields, each
introduced by a distinct prefix character or characters. A complete
signature looks like this:
@epsd2%sux:a=a[water//water]N'N/a#~$a
Abbreviated Signatures
Oracc's lemmatization allows users to enter just a few key pieces
of a signature (this is the "instance lemmatization") and the
lemmatizer looks this data up in the glossary and creates complete
signatures from it.
Users typically enter just the citation form (CF, a in the example
above) and the sense (SENSE, water above), and the instance
lemmatization is then a[water]
. It is
also not uncommon to give simply a Part-of-Speech, such as PN
, for the instance lemmatization.
Signature Fields and Prefix Characters
- @ = PROJECT
- The project to which this signature belongs
- % = LANG
- The language for the signature; may also have a writing system, e.g., %akk-949 for normalized Akkadian
- : = FORM
- The form of the word as it appears in the text
- = = CF
- The = separates FORM and CF, or Citation Form; the equals may be omitted if the CF is the first entry in the signature, as in a[water]
- [ ... ]
- Square brackets surround the GW, or Guide Word, and/or SENSE
- //
- Only within [...], the double slash // separates GW and SENSE
- POS
- The POS (Part-of-Speech) is identified by its position immediately after the closing square bracket of [...]
- ' = EPOS
- The right-quote, ', is the prefix character for the EPOS, the Effective Part-of-Speech
- $ = NORM
- The normalized version of the writing. This varies by
language: in Akkadian, for example, it is the transcription of
the word-form, without hyphens and determinatives and with
accents. This is not used in the source version of Sumerian
glossaries because it can be computed from the morphology (see
below).
- * = STEM
- The STEM, which may be a form of the BASE in Sumerian, or a
notation such as D, Š, N, in Akkadian, or possibly other
conventions for other languages.
- / = BASE
- The BASE utilized in a Sumerian writing. This must match a
base given in the
@bases
part of the entry.
- + = CONT
- The Sumerian grapheme following the base, used only when that
grapheme is the continuation of the end of the BASE,
e.g.,
-ma
in inim-ma
. The
deconstruction of the grapheme gives the consonant which
continues the grapheme followed by the vowel which is normally a
morpheme or morpheme constituent.
- # = MORPH
- The morphology string for the writing.
- ## = MORPH2
- The second morphology string for the writing.
- @ = RWS
- The RWS, Register or Writing System, for the form.