Project Management Procedures with Unix

This page describes how to manage your Oracc corpus through a Unix terminal from a PC or Mac. It takes you through uploading files to Oracc, checking the project, adding new entries to the glossaries, making corrections to texts that are already online, and rebuilding the project website.

Throughout these instructions, substitute proj for the name of your project (e.g, obstn, cams, hbtin) and subproj for any subprojects you are running. [LANGUAGE] stands for any one of the Oracc language codes for the ancient languages ini your project.

For more information on the oracc command, see the page on The Oracc Command.

Before you begin

Managing an Oracc corpus entails two types of communication with the Oracc server:

Uploading files to Oracc; and
Connecting to Oracc with a command-line (text-based) terminal programme to enable you to manage the files on the Oracc server.

To do this on a Mac, you will need to use the Terminal utility, which you will find in Applications/Utilities. You might find it useful to keep the Terminal in your Dock if you will be using it regularly. When the Terminal is open, hold your mouse down on its icon in the Dock (a black computer screen) and choose Keep in Dock.

On a PC, you will need to install a terminal utility such PuTTy [http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html].

When uploading files to Oracc, you may prefer to use a programme that gives a view of the local and remote folders and allows you to drag files from one to the other. If you are using a Mac, it is worth trying Cyberduck [http://cyberduck.en.softonic.com/mac] and for PCs there is WinSCP [http://winscp.net/eng/index.php].

Once you have installed the software you want, you need to secure your connection to the Oracc server. Generally speaking, you will only need to do this once.

Some useful Unix commands

There may be times when you need to move or delete project files on Oracc. Do this very carefully! The basic commands you need are:

cd: change directories, e.g., cd sources to move to the sources directory from your project's home directory; cd .. to move up a level in the directory hierarchy (for instance from sources to your home directory)
less: a pager to read text files, e.g., less oracc.log
ls: get a listing of the files and directories in the current directory
man: get documentation about a command, e.g., man ls
mkdir: create new directory, e.g., mkdir photos
mv: move files (copy and delete the original, effectively; good for when you need to rename files), e.g., mv mb12345.atf bm12345.atf
passwd: change the password for your project; you will be asked for the current one (once) and the new one (twice)
rm: remove or delete files -- use with caution!. E.g., rm bm12345.atf"
rmdir: delete empty directory (will not work if there are files in it), e.g., rmdir photos

Uploading ATF, ODS and catalogue files to Oracc

This stage is only necessary if your ATF and/or ODS files are not on Oracc already but are on your own computer and/or you have your own project catalogue. If you access your ATF files remotely from Oracc, and your project uses the CDLI catalogue [http://cdli.ucla.edu/], you can skip straight to Checking the Corpus.

Before you upload ATF or ODS files to Oracc, you must check that they are clean with the Checker webservice.

If you are using FuGu or WinSCP, you will need to enter the following information in the login dialogue box:

host: oracc.museum.upenn.edu
user name: proj
password: [your project's password]

Copy ATF files into your project's (or subproject's) 00atf/ folder and (if applicable) the XML file exported from your catalogue into your project's 00cat/ folder.

If you are comfortable using a command line interface such as Terminal (for Mac) or PuTTy (on a PC), then follow these instructions instead:

Before you do this for the first time, locate the root (PC) or home (Mac) directory of your computer. On a Mac, this is Users/username (e.g., Users/er264) and on a PC it is c:\Documents and Settings\username\ (e.g, c:\Documents and Settings\er264\). Inside it, make a new folder called proj and inside that, make two new folders 00atf and 00cat. You will use these folders as a convenient place to put files that are to be copied to and from Oracc.
Copy ATF and ODS files to be uploaded into your proj/00atf folder and/or export your catalogue to proj/00cat. Delete or move any old files that are still in that folder if you have followed this procedure before. This is IMPORTANT as you don't want to over-write files that are already on Oracc.
Open the Terminal or PuTTy.

To copy ATF files, at the prompt type:

scp ~/proj/00atf/*.atf proj@oracc.museum.upenn.edu:00atf

or, for a subproject:

scp ~/proj/00atf/*.atf proj@oracc.museum.upenn.edu:subproj/00atf

and press return.

To copy ODS files, at the prompt type:

scp ~/proj/00atf/*.ods proj@oracc.museum.upenn.edu:00atf

or, for a subproject:

scp ~/proj/00atf/*.ods proj@oracc.museum.upenn.edu:subproj/00atf

and press return.

To copy the XML file of your project catalogue, at the prompt type:

scp ~/proj/00cat/*-P.xml proj@oracc.museum.upenn.edu:00cat

scp ~/proj/00cat/*-P.xml proj@oracc.museum.upenn.edu:subproj/00cat

and press return.

When prompted for the password, type it and press return.
Watch while the machine describes its progress. Leave the window open, as you'll need it again later.

Downloading files from Oracc

For several steps in the project management process you will need to edit files. It is best to work with the file(s) on Oracc, even if you originally created them on your own computer, so that you can be confident that it is the latest version. To download files from Oracc follow these instructions:

If you are using FuGu or WinSCP, you will need to enter the following information in the login dialogue box:

host: oracc.museum.upenn.edu
user name: proj
password: [your project's password]

Copy ATF and/or ODS files from your project's 00atf folder onto your own computer. Copy [LANGUAGE].glo files from your project's 00lib folder.

If you are using Terminal (for Mac) or PuTTy (on a PC), then follow these instructions instead:

Open the Terminal or PuTTy.
To copy an ATF file called example.atf from 00atf into the proj/00atf folder on your own computer, at the prompt type:
```
scp proj@oracc.museum.upenn.edu:00atf/example.atf ~/proj/00atf
```
and press return.

To copy an ODS file from 00atf, at the prompt type:
```
scp proj@oracc.museum.upenn.edu:00atf/example.ods ~/proj/00atf
```
and press return.
To copy a [LANGUAGE].glo file from 00lib, at the prompt type:
```
scp proj@oracc.museum.upenn.edu:00lib/[LANGUAGE].glo ~/proj/00lib
```
and press return.
If you are working with a subproject add subproj/ after the colon, e.g., scp proj@oracc.museum.upenn.edu:subproj/00lib/[LANGUAGE].glo ~/proj/00lib
When prompted for the password, type it and press return.
Watch while the machine describes its progress. Leave the window open, as you'll need it again later.

Checking the corpus

Now you need to check that all is will with your corpus as a whole.

Open a new window in the Terminal.
Type ssh proj@oracc.museum.upenn.edu
Enter the password when asked. The prompt should now begin with something like [proj@oracc ~]:
If necessary move to your subproject by typing cd subproj and pressing return.
Type oracc check at the prompt and press return. (There are also various options you can use: see the Oracc Command page for more details.
Look to see if there are any ATF errors, by typing less oracc.log at the command line. Use the up and down arrows to scroll through, and leave it by typing q for "quit".
If necessary, follow the instructions in Correcting errors in online data.
If instead you just get a message like this:
```
Glossary lib/akk-x-oldbab.glo OK
Glossary lib/qpn.glo OK
Glossary lib/sux.glo OK
```
all is well and you can proceed to the next steps.

You can now move to Adding new PSUs to the glossary or straight to Harvesting new lemmatisation data.

Adding PSUs to the glossary

New Phrasal Semantic Units such as lumun libbi [sorrow] N and karṣa akālu [slander] V have to be added to the glossary by hand. Here's how to do it. First you need to download and open the glossary in an editor if you haven't already got it open.

Create a new entry that looks like this:

@entry lumun libbi [sorrow] N
@parts lumnu[evil]N libbu[interior]N
@form ŠA₃.HUL $lumun $libbi
@sense N sorrow
@end entry

@entry karṣa akālu [slander] V
@parts karṣu[slander]N akālu[eat]V
@form kar-ṣa_GU₇ $karṣa $ikkalū
@sense V slander
@end entry

where in @parts each bit is of the form CF[GW]POS and in @form the transliterated words are connected by an underscore and EVERY normalised word is preceded with a $.

If an idiom has more than one form (spelling) and/or sense, you can add them too. For instance:

@entry lumun libbi [sorrow] N
@parts lumnu[evil]N libbu[interior]N
@form ŠA₃.HUL $lumun $libbi
@form ŠA₃.HUL-šu $lumun $libbišu
@sense N sorrow
@sense N eclipsed stat
@end entry

Make sure that in the lemmatisation (in the ATF file) you indicate the SENSE correctly.

If the PSU is written with two or more words in the transliteration (i.e., like kar-ṣa GU₇ but not like ŠA₃.HUL), you now need to add an extra @form line to the glossary entries for each of the consituent parts of the PSU. In the first example above, written with a single logogram, the extra @form lines are added automatically when the glossaries are merged. But for for the PSU karṣa akālu [slander] V, you need to add this line in the entry for karṣu [slander] N:
```
@form kar-ṣu_GU₇ $karṣa $(ikkalū)
```
and this line to the glossary entry for akālu[eat]V:
```
@form kar-ṣu_GU₇ $(karṣa) $ikkalū
```
If the individual constituents of the PSU are not yet in [LANGUAGE].glo, you must add these lines later, after you have merged the glossaries.

When you have finished and saved [LANGUAGE].glo (you will need to give the password when prompted), you need to check it, either by using the ATF processor [http://oracc.museum.upenn.edu/util/atfproc.html] or by uploading the file to the 00lib/ directory on Oracc and checking the corpus.

Correct any errors that are listed, save [LANGUAGE].glo, and check again until no errors are listed.

Adding linguistic annotations to the glossary

Optionally, you can manually add information ṭo the glossary about roots and dialects, and further explanations of guidewords.

To gloss a guideword, simply add your comment after a semicolon and space. This is particularly useful for technical logograms which can't easily be translated by single words and short phrases. For instance:
```
@entry KUR [KUR; in mathematical astronomy, the time difference between 
the full or new moon and sunrise or sunset] N
@form KUR $KUR
@sense N KUR
@end entry
```
For more information, see the documentation on difficult words in Akkadian linguistic annotation.
To add roots, you manually add a @root line to that word's entry in the glossary. For instance:
```
@entry parāsu [cut (off)] V
@root prs
@form ip-ru-us $iprus
@sense V cut (off)
@end entry
```
Just as Oracc policy is for citation forms to follow the Concise Dictionary of Akkadian, so must roots follow the CDA roots list. They must be written in Unicode, with ʾ (aleph) where appropriate, not single quote marks or ther similar characters. For instance @root ʾpš would be the correct annotation for epēšu. As CDA does not use ʿ (ayin), then nor does Oracc.
Even though every Akkadian citation form must be in CDA-style "conventional Akkadian", you can add dialect CFs to the glossary by hand, as follows:
```
@entry awātu [word] N
@NA abūtu
@form a-mat $amāt
@form INIM $amāta
@end entry
```

Harvesting new lemmatisation data

This process collects together all the newly lemmatised data so that you can check it for errors and correct them before the big glossaries are rebuilt.

Type:
```
oracc harvest
```
at the command line prompt of your open Oracc terminal (See the Oracc command page).

Wait while the processor describes its progress. The final lines should say something like:

harvested 4 forms with new qpn data; see 01bld/new/qpn.new
harvested 7 forms with new sum data; see 01bld/new//sux.new
harvested 33 forms with new akk-x-oldbab data; see 01bld/new/akk-x-oldbab.new

Check for errror messages in the process by typing less oracc.log at the command line.

Now you need to check and correct the harvested lemmatisations.

At the terminal prompt type:
```
less 01bld/new/[LANGUAGE].new
```
to see the mini-glossary file containing new entries to be merged into the existing glossaries. Use the arrow keys or space bar to move through it, and type q to exit. The file is read-only.
Each glossary entry takes the form:
```
@entry alāku [go] V
@form DU{+ku} $illakū
@sense V flow
@end entry
```
If you spot a mistake here, find where it originates in the original ATF file (which is currently in your home or root directory on your own computer, or in the 00atf/ directory on Oracc) and correct it there.
When you have corrected all the mistakes you can see, upload the corrected ATF files again (if they are on your computer), as described in Uploading completed work, and run oracc harvest again.
Repeat this process until you are confident that the new lemmatisation data is correct.

Now you need to merge the new data with the existing glossaries.

Merging the glossaries

Only do this step when you are confident that all the lemmatisation data in the [LANGUAGE].new files is correct (see the section on Harvesting).

For each ancient language in your project type oracc merge [LANGUAGE] at the terminal prompt. This routine merges the new data with the old. You don't have to do them all at once: you can manage each language glossary entirely separately.

If you added new PSU data to a [LANGUAGE].glo earlier in the process, that involved words that were not yet in the glossaries, you need to finish that job. Otherwise, go on to rebuild the website.

Close [LANGUAGE].glo and open it again (so that the new version contains the freshly merged data from [LANGUAGE].new).
Follow the instructions given in the Adding PSUs section to add @form lines to new entries in [LANGUAGE].glo.
Save [LANGUAGE].glo and check it.
Correct any errors in [LANGUAGE].glo, save and repeat until [LANGUAGE].glo is clean.

Now you are ready to rebuild the website.

Rebuilding the corpus

This is the final step in putting edited material online.

If you have your own project catalogue, make sure that an XML copy of the latest version has been uploaded to Oracc. If you just use the CDLI catalogue, ignore this step.
At the terminal prompt, type
```
oracc build corpus
```
and press return. This makes the new texts, glossary entries, and metadata available on the server. This process will run whether or not you are connected to Oracc.
You then need to check for errors. You can do this by typing less oracc.log at the prompt and pressing return. You can scroll through the listing using the up and down arrow keys on your keyboard, and exit the listing by typing q (for "quit").
Once your corpus is building cleanly, check how it looks. If you notice mistakes online, you will need to correct these errors too.

Correcting errors in online data

Sometimes it's necessary to correct mistakes in, or or make improvements to, transliterations, translations, or glossary entries, or metadata that has already been published to the server.

Correcting errors in the metadata

If you see mistakes in the metadata displayed in the left-hand sidebar on your project's website:

If your project has its own catalogue, make corrections in the Filemaker database and upload a new XML copy to Oracc. (See the project catalogues page for more details.)
If your project uses the CDLI catalogue for its metadata, contact CDLI or osc@oracc.org to ask for corrections.

Update the catalogue installation by rebuilding the corpus. If you also want to make changes to ATF or ODS files, continue to the following section without rebuilding yet.

Correcting errors in ATF or ODS files

When correcting errors in ATF or ODS files, it is best to work with the file(s) on Oracc, even if you originally created them on your own computer, so that you can be confident that it is the latest version.

Download them using the instructions given above.
Don't forget to use the ATF checker webservice before you upload the corrected file(s) to Oracc again.

If you are correcting a lemmatised file, you will also have to delete the incorrect lemmatisation from the relevant glossary entry in [LANGUAGE].glo. Following the harvest and merge routine will add the correct new lemmatisation to [LANGUAGE].glo.

When you are done, check the corpus, fix any errors, and rebuild the website.

18 Dec 2019 osc at oracc dot org

Eleanor Robson

Eleanor Robson, 'Project Management Procedures with Unix', Oracc: The Open Richly Annotated Cuneiform Corpus, Oracc, 2019 [http://oracc.museum.upenn.edu/doc/help/managingprojects/procedures/]