Annotation of Sumerian syntax

This document provides an overview of language-specific annotation conventions for Sumerian syntax used in Oracc.

There is also a primer on linguistic annotation of Sumerian, which you should read before this page.

Syntax

PPCS

The syntax annotation conventions are developed and maintained in and adjunct project related to the Pennsylvania Sumerian Dictionary. This project is known as the PPCS: the Penn Parsed Corpus of Sumerian. Its documentation (which is in the process of being updated) is available on the PPCS website [http://psd.museum.upenn.edu/ppcs/].

The PPCS is based on a Penn tradition of parsed corpora which have common vocabulary and annotation conventions and this has informed the descriptive vocabulary used for syntactic features. Readers interested in the broader context of this work can see the Penn Parsed Corpora of Historical English Home Page [http://www.ling.upenn.edu/hist-corpora/].

Parser

Syntax processing is carried out using a Sumerian-specific parser which understands enough about Sumerian syntax to enable it to work effectively on many texts with only minimal hints being given in the annotation. In the following sections we describe what the parser can and cannot do, and what it sometimes needs to be prevented from doing, at the same time as we describe the notations and features relevant to annotation in ATF files.

Phrasal Parenthesis

Parenthetic noun-phrases (NP-PRN) are the name used for appositional phrases. The parser normally recognizes parenthetic phrases automatically by considering the context and the semantic class of key words.

To force a lemma to be the head noun of a parenthetic phrase, use the notation +, (plus-sign followed by comma). To suppress parenthetic interpretation, use -, (minus-sign followed by comma).

In the following example, the parser defaults to understanding dijir as parenthetic but that is incorrect; the annotation suppresses the parser's enthusiasm in this case:

1. {d}nin-jesz-zid-da
#lem: DN

2. dijir gu3-de2-a-kam
#lem: -, dijir[deity]; RN

Phrasal Conjunction

Because the parser defaults to understanding adjacent otherwise unmarked NPs as parenthetic it is necessary to mark phrasal conjunctions unless they are in the parser's built-in table.

The following entries are in the parser's table:

# List of N N conjunction pairs
# input must be a list of pairs of ePSD CFGWs
an[heaven] ki[earth]
dam[spouse] dumu[child]
kug[metal] zagin[lapis lazuli]
gud[ox] udu[sheep]
kugsig[gold] kugbabbar[silver]
Enki Ninki

Common cases should be added to this list; less common cases can be annotated using the convention +& (plus-sign ampersand). Similarly, an unwanted conjunction can be suppressed using the notation -& (minus-sign ampersand):

1. ku6 muszen
#lem: kud[fish]; +& muszen[bird]

Noun-noun Modification

Nouns can modify nouns in Sumerian and the parser supports this with a built-in table (derived mainly from third millennium royal inscriptions) as well as the notation +<. The current version of the table contains the following entries:

# List of N N modifier pairs; default is to assume post-mod
# for pre-mod put '+' before first element
#
# input must be a list of pairs of ePSD CFGWs; proper nouns should
# not have a guideword element

e[house] ub[corner]

šita[weapon] saŋ[head]
šita[weapon] ur[lion]
ur[lion] saŋ[head]

aga[rear] eren[cedar]

eš[shrine] Bagara
eš[shrine] Gutur
eš[shrine] Nibru
eš[shrine] Ŋirsu
gu[neck] Idigna
iri[city] Ŋirsu
egal[palace] Tiraš
hursaŋ[mountain] Uringiriaz
hursaŋ[mountain] Magan
kur[foreign land] Magan
kur[foreign land] Dilmun
abulla[gate] Kasurra

i[oil] nun[prince]
ir[scent] nun[prince]

mu[name] gilsa[treasure]

kar[harbor] zagin[lapis lazuli]
udu[sheep] i[oil]
udu[sheep] nita[male]
gur[unit] lugal[king]

A manually annoted N-N modifier looks like this:

1. ab2 ti dara4
#lem: ab[cow]; +< ti[rib]; dara[brown]
#tr: brown-ribbed cow

Pre-modification

Premodification, with the exception of kug inana, must be annotated manually using the notation +>:

1. kug ama {d}nansze
#lem: kug[pure] +> ; ama[mother]; DN

Subordinate Clauses

Annotation of subordinate clauses generally involves only giving the start of the clause using the ([WORD]) convention. Notes on individual clause types are given below.

Relative Clauses

The parser is usually able to determine the start of a relative clause. If necessary, however, the notation (S-REL) can be inserted before the first lemma of the clause:

1. lu2 e2 mu-du3-a
#lem: lu[person]; (S-REL) e[house]; du[build]

N.B.:This is just by way of example; relatives like this are correctly handled by the parser without hinting.

DE3 Clauses

The parser never tries to guess the start of a DE3-clause. As a result, it is always necessary to supply the start and type of the clause. The most common type is purpose (S-PRP); other DE3-clauses should be tagged as S-ADV:

1. e2 du3-u3-de3 i3-jen
#lem: (S-PRP) e[house]; du[build]; jen[go]

18 Dec 2019 osc at oracc dot org

Steve Tinney

Steve Tinney, 'Annotation of Sumerian syntax', Oracc: The Open Richly Annotated Cuneiform Corpus, Oracc, 2019 [http://oracc.museum.upenn.edu/doc/help/languages/sumerian/syntax/]