This page describes how to manage your Oracc corpus through a Unix terminal from a PC or Mac. It takes you through uploading files to Oracc, checking the project, adding new entries to the glossaries, making corrections to texts that are already online, and rebuilding the project website.
Throughout these instructions, substitute proj
for the name of your project (e.g, obstn, cams, hbtin) and subproj
for any subprojects you are running. [LANGUAGE]
stands for any one of the Oracc language codes for the ancient languages ini your project.
For more information on the oracc
command, see the page on The Oracc Command.
Before you begin | Uploading files | Downloading files | Checking the corpus | Adding PSUs | Adding linguistic annotations | Harvesting | Merging the glossaries | Rebuilding | Correcting errors
Managing an Oracc corpus entails two types of communication with the Oracc server:
To do this on a Mac, you will need to use the Terminal utility, which you will find in Applications/Utilities. You might find it useful to keep the Terminal in your Dock if you will be using it regularly. When the Terminal is open, hold your mouse down on its icon in the Dock (a black computer screen) and choose Keep in Dock
.
On a PC, you will need to install a terminal utility such PuTTy [http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html].
When uploading files to Oracc, you may prefer to use a programme that gives a view of the local and remote folders and allows you to drag files from one to the other. If you are using a Mac, it is worth trying Cyberduck [http://cyberduck.en.softonic.com/mac] and for PCs there is WinSCP [http://winscp.net/eng/index.php].
Once you have installed the software you want, you need to secure your connection to the Oracc server. Generally speaking, you will only need to do this once.
There may be times when you need to move or delete project files on Oracc. Do this very carefully! The basic commands you need are:
cd sources
to move to the sources
directory from your project's home directory; cd ..
to move up a level in the directory hierarchy (for instance from sources
to your home directory)less oracc.log
man ls
mkdir photos
mv mb12345.atf bm12345.atf
rm bm12345.atf"
rmdir photos
This stage is only necessary if your ATF and/or ODS files are not on Oracc already but are on your own computer and/or you have your own project catalogue. If you access your ATF files remotely from Oracc, and your project uses the CDLI catalogue [http://cdli.ucla.edu/], you can skip straight to Checking the Corpus.
Before you upload ATF or ODS files to Oracc, you must check that they are clean with the Checker webservice.
If you are using FuGu or WinSCP, you will need to enter the following information in the login dialogue box:
host: oracc.museum.upenn.edu user name: proj password: [your project's password]
Copy ATF files into your project's (or subproject's) 00atf/
folder and (if applicable) the XML file exported from your catalogue into your project's 00cat/
folder.
If you are comfortable using a command line interface such as Terminal (for Mac) or PuTTy (on a PC), then follow these instructions instead:
Before you do this for the first time, locate the root (PC) or home (Mac) directory of your computer. On a Mac, this is Users/username
(e.g., Users/er264
) and on a PC it is c:\Documents and Settings\username\
(e.g, c:\Documents and Settings\er264\
). Inside it, make a new folder called proj
and inside that, make two new folders 00atf
and 00cat
. You will use these folders as a convenient place to put files that are to be copied to and from Oracc.
Copy ATF and ODS files to be uploaded into your proj/00atf
folder and/or export your catalogue to proj/00cat
. Delete or move any old files that are still in that folder if you have followed this procedure before. This is IMPORTANT as you don't want to over-write files that are already on Oracc.
Open the Terminal or PuTTy.
To copy ATF files, at the prompt type:
scp ~/proj/00atf/*.atf proj@oracc.museum.upenn.edu:00atf
or, for a subproject:
scp ~/proj/00atf/*.atf proj@oracc.museum.upenn.edu:subproj/00atf
and press return.
To copy ODS files, at the prompt type:
scp ~/proj/00atf/*.ods proj@oracc.museum.upenn.edu:00atf
or, for a subproject:
scp ~/proj/00atf/*.ods proj@oracc.museum.upenn.edu:subproj/00atf
and press return.
To copy the XML file of your project catalogue, at the prompt type:
scp ~/proj/00cat/*-P.xml proj@oracc.museum.upenn.edu:00cat
or
scp ~/proj/00cat/*-P.xml proj@oracc.museum.upenn.edu:subproj/00cat
and press return.
When prompted for the password, type it and press return.
Watch while the machine describes its progress. Leave the window open, as you'll need it again later.
For several steps in the project management process you will need to edit files. It is best to work with the file(s) on Oracc, even if you originally created them on your own computer, so that you can be confident that it is the latest version. To download files from Oracc follow these instructions:
If you are using FuGu or WinSCP, you will need to enter the following information in the login dialogue box:
host: oracc.museum.upenn.edu user name: proj password: [your project's password]
Copy ATF and/or ODS files from your project's 00atf
folder onto your own computer. Copy [LANGUAGE].glo files from your project's 00lib
folder.
If you are using Terminal (for Mac) or PuTTy (on a PC), then follow these instructions instead:
Open the Terminal or PuTTy.
To copy an ATF file called example.atf
from 00atf
into the proj/00atf
folder on your own computer, at the prompt type:
scp proj@oracc.museum.upenn.edu:00atf/example.atf ~/proj/00atf
and press return.
To copy an ODS file from 00atf
, at the prompt type:
scp proj@oracc.museum.upenn.edu:00atf/example.ods ~/proj/00atf
and press return.
To copy a [LANGUAGE].glo file from 00lib
, at the prompt type:
scp proj@oracc.museum.upenn.edu:00lib/[LANGUAGE].glo ~/proj/00lib
and press return.
If you are working with a subproject add subproj/
after the colon, e.g., scp proj@oracc.museum.upenn.edu:subproj/00lib/[LANGUAGE].glo ~/proj/00lib
When prompted for the password, type it and press return.
Watch while the machine describes its progress. Leave the window open, as you'll need it again later.
Now you need to check that all is will with your corpus as a whole.
Open a new window in the Terminal.
Type ssh proj@oracc.museum.upenn.edu
Enter the password when asked. The prompt should now begin with something like [proj@oracc ~]:
If necessary move to your subproject by typing cd subproj
and pressing return.
oracc check
at the prompt and
press return. (There are also various options you can use: see the Oracc
Command page for more details.Look to see if there are any ATF errors, by typing less oracc.log
at the command line. Use the up
and down arrows to scroll through, and leave it
by typing q
for "quit".
If necessary, follow the instructions in Correcting errors in online data.
If instead you just get a message like this:
Glossary lib/akk-x-oldbab.glo OK Glossary lib/qpn.glo OK Glossary lib/sux.glo OK
all is well and you can proceed to the next steps.
You can now move to Adding new PSUs to the glossary or straight to Harvesting new lemmatisation data.
New Phrasal Semantic Units such as lumun libbi [sorrow] N
and karṣa akālu [slander] V
have to be added to the glossary by hand. Here's how to do it. First you need to download and open the glossary in an editor if you haven't already got it open.
Create a new entry that looks like this:
@entry lumun libbi [sorrow] N @parts lumnu[evil]N libbu[interior]N @form ŠA₃.HUL $lumun $libbi @sense N sorrow @end entry
or
@entry karṣa akālu [slander] V @parts karṣu[slander]N akālu[eat]V @form kar-ṣa_GU₇ $karṣa $ikkalū @sense V slander @end entry
where in @parts
each bit is of the form CF[GW]POS
and in @form
the transliterated words are connected by an underscore and EVERY normalised word is preceded with a $
.
If an idiom has more than one form (spelling) and/or sense, you can add them too. For instance:
@entry lumun libbi [sorrow] N @parts lumnu[evil]N libbu[interior]N @form ŠA₃.HUL $lumun $libbi @form ŠA₃.HUL-šu $lumun $libbišu @sense N sorrow @sense N eclipsed stat @end entry
Make sure that in the lemmatisation (in the ATF file) you indicate the SENSE correctly.
If the PSU is written with two or more words in the transliteration (i.e., like kar-ṣa GU₇
but not like ŠA₃.HUL
), you now need to add an extra @form
line to the glossary entries for each of the consituent parts of the PSU. In the first example above, written with a single logogram, the extra @form
lines are added automatically when the glossaries are merged. But for for the PSU karṣa akālu [slander] V
, you need to add this line in the entry for karṣu [slander] N
:
@form kar-ṣu_GU₇ $karṣa $(ikkalū)
and this line to the glossary entry for akālu[eat]V
:
@form kar-ṣu_GU₇ $(karṣa) $ikkalū
If the individual constituents of the PSU are not yet in [LANGUAGE].glo, you must add these lines later, after you have merged the glossaries.
When you have finished and saved [LANGUAGE].glo (you will need to give the password when prompted), you need to check it, either by using the ATF processor [http://oracc.museum.upenn.edu/util/atfproc.html] or by uploading the file to the 00lib/
directory on Oracc and checking the corpus.
Correct any errors that are listed, save [LANGUAGE].glo, and check again until no errors are listed.
Optionally, you can manually add information ṭo the glossary about roots and dialects, and further explanations of guidewords.
To gloss a guideword, simply add your comment after a semicolon and space. This is particularly useful for technical logograms which can't easily be translated by single words and short phrases. For instance:
@entry KUR [KUR; in mathematical astronomy, the time difference between the full or new moon and sunrise or sunset] N @form KUR $KUR @sense N KUR @end entry
For more information, see the documentation on difficult words in Akkadian linguistic annotation.
To add roots, you manually add a @root
line to that word's entry in the glossary. For instance:
@entry parāsu [cut (off)] V @root prs @form ip-ru-us $iprus @sense V cut (off) @end entry
Just as Oracc policy is for citation forms to follow the Concise Dictionary of Akkadian, so must roots follow the CDA roots list. They must be written in Unicode, with ʾ (aleph) where appropriate, not single quote marks or ther similar characters. For instance @root ʾpš
would be the correct annotation for epēšu. As CDA does not use ʿ (ayin), then nor does Oracc.
Even though every Akkadian citation form must be in CDA-style "conventional Akkadian", you can add dialect CFs to the glossary by hand, as follows:
@entry awātu [word] N @NA abūtu @form a-mat $amāt @form INIM $amāta @end entry
This process collects together all the newly lemmatised data so that you can check it for errors and correct them before the big glossaries are rebuilt.
Type:
oracc harvest
at the command line prompt of your open Oracc terminal (See the Oracc command page).
Wait while the processor describes its progress. The final lines should say something like:
harvested 4 forms with new qpn data; see 01bld/new/qpn.new harvested 7 forms with new sum data; see 01bld/new//sux.new harvested 33 forms with new akk-x-oldbab data; see 01bld/new/akk-x-oldbab.new
less
oracc.log
at the command line.Now you need to check and correct the harvested lemmatisations.
At the terminal prompt type:
less 01bld/new/[LANGUAGE].new
to see the mini-glossary file containing new entries to be merged into the existing glossaries. Use the arrow keys or space bar to move through it, and type q
to exit. The file is read-only.
Each glossary entry takes the form:
@entry alāku [go] V @form DU{+ku} $illakū @sense V flow @end entry
If you spot a mistake here, find where it originates in the original ATF file (which is currently in your home or root directory on your own computer, or in the 00atf/
directory on Oracc) and correct it there.
When you have corrected all the mistakes you can see, upload the corrected ATF files again (if they are on your computer), as described in Uploading completed work, and run oracc harvest
again.
Repeat this process until you are confident that the new lemmatisation data is correct.
Now you need to merge the new data with the existing glossaries.
Only do this step when you are confident that all the lemmatisation data in the [LANGUAGE].new
files is correct (see the section on Harvesting).
For each ancient language in your project type oracc merge [LANGUAGE]
at the terminal prompt. This routine merges the new data with the old. You don't have to do them all at once: you can manage each language glossary entirely separately.
If you added new PSU data to a [LANGUAGE].glo earlier in the process, that involved words that were not yet in the glossaries, you need to finish that job. Otherwise, go on to rebuild the website.
Close [LANGUAGE].glo and open it again (so that the new version contains the freshly merged data from [LANGUAGE].new).
Follow the instructions given in the Adding PSUs section to add @form
lines to new entries in [LANGUAGE].glo.
Save [LANGUAGE].glo and check it.
Correct any errors in [LANGUAGE].glo, save and repeat until [LANGUAGE].glo is clean.
Now you are ready to rebuild the website.
This is the final step in putting edited material online.
If you have your own project catalogue, make sure that an XML copy of the latest version has been uploaded to Oracc. If you just use the CDLI catalogue, ignore this step.
At the terminal prompt, type
oracc build corpus
and press return. This makes the new texts, glossary entries, and metadata available on the server. This process will run whether or not you are connected to Oracc.
You then need to check for errors. You can do this by typing less oracc.log
at the prompt and pressing return. You can scroll through the listing using the up and down arrow keys on your keyboard, and exit the listing by typing q
(for "quit").
Once your corpus is building cleanly, check how it looks. If you notice mistakes online, you will need to correct these errors too.
Sometimes it's necessary to correct mistakes in, or or make improvements to, transliterations, translations, or glossary entries, or metadata that has already been published to the server.
If you see mistakes in the metadata displayed in the left-hand sidebar on your project's website:
If your project has its own catalogue, make corrections in the Filemaker database and upload a new XML copy to Oracc. (See the project catalogues page for more details.)
If your project uses the CDLI catalogue for its metadata, contact CDLI or osc@oracc.org to ask for corrections.
Update the catalogue installation by rebuilding the corpus. If you also want to make changes to ATF or ODS files, continue to the following section without rebuilding yet.
When correcting errors in ATF or ODS files, it is best to work with the file(s) on Oracc, even if you originally created them on your own computer, so that you can be confident that it is the latest version.
Download them using the instructions given above.
Don't forget to use the ATF checker webservice before you upload the corrected file(s) to Oracc again.
If you are correcting a lemmatised file, you will also have to delete the incorrect lemmatisation from the relevant glossary entry in [LANGUAGE].glo. Following the harvest and merge routine will add the correct new lemmatisation to [LANGUAGE].glo.
When you are done, check the corpus, fix any errors, and rebuild the website.
18 Dec 2019Eleanor Robson
Eleanor Robson, 'Project Management Procedures with Unix', Oracc: The Open Richly Annotated Cuneiform Corpus, Oracc, 2019 [http://oracc.museum.upenn.edu/doc/help/managingprojects/procedures/]