Documenting the Wichita Language on Digital Video & Audio
The process of annotation involves marking the video/audio data with appropriate information that would make it more useful and accessible for use by non-experts of the source language. Annotation also makes the material more readily available for those who want to do linguistic research that uses the data. Depending on the nature of the material we are working with (i.e., is it a recently recorded video or an audio file from our older archives) we have followed slightly different annotation procedures.
Annotation of New Material: mini-DV and DAT Tapes
(a) One of the steps in annotation of natural data is to make sure that as much of the speech by each of the participants is written down and at least translated into another language (in our case the source language is Wichita and the translations are in English). For each participant in the conversation or story-telling we have to mark the time region for a given phrase, or a any continous string, of their speach and we have to write the information in a Wichita tier (this tier is thus time-aligned). We also have to make a tier that provides the free English translation. From this point on we, as analysts and researchers, have to make decisions on what other tiers will be useful. For the case of Wichita we have decided to add at least a morphemic gloss tier and a morpheme-by-morpheme translation tier to the basic annotation (see the sample Wichita texts for an example of what this looks like). Time permitting, we also add syntactic information and comment tiers.
(b) After making our decisions about the types of tiers that we wanted, we copied the individual session DMFs onto our computers and began the procedure of annotating the sessions using the ELAN annotation tool from DOBES (the software can be obtained form www.mpi.nl/tools). The audio/video DFM file is opened in ELAN, the appropriate tiers are created, and we begin the process of marking regions and entering in annotations. The creation of this transcript is perhaps the most time-consuming componennt of the work.
(c) Finally our annotation files are sent to DOBES for archiving and access by others who are interested in the language.
Annotation of Older Material (digitized from Cassettes and Reels)
Most (see red note below) of our older recorded Wichita stories and conversations had already been transcribed either in older ascii text transcripts or in the form of hard-copy documents. Furthermore, some of the texts had already been entered into the Shoebox program. For material of this nature we mostly tried to import the already existing transcripts into ELAN. One of the problems with our older transcripts is that they are either hard-copy or simple computer text files; thus, they are detached from the temporal information of the corresponding recording. This makes importing into ELAN a bit problematic since ELAN transcripts are by their very nature time-aligned. The procedure we are currently attempting is the following.
(a) Using the digitized sound file of the older recordings we manually reconstruct the time information. Using Transcriber (or other equivalent software) we mark the regions in the sound file that correspond to lines (phrases, sentences, or intonational units) in the already existing transcripts.
(b) We enter the time code information from (a) into a file, either in Shoebox or in just a simple text file.
(c) We assemble together a Shoebox style file that contains all the various tiers in the transcripts as well as the beginning and end time codes that indicate the region for each line/unit in the transcript.
(d) We use ELAN along with a Shoebox type file (of the form of econv type specified by the ELAN instructions) to import the transcript into ELAN. The imported transcripts have to be checked for tier information and time alignment accuracy.
(e) The DMF files that we have created for each of the narratives and the ELAN file containing the tiers and annotations are all sent to DOBES for archiving and access.
Note: One piece of our older material is transcribed similar to the new digital material (mini-DV/DAT) procedure outlined above. This pieces is a 28-minute conversation that Rood recorded in 1966. It was never transcribed and since we were starting from zero on this transcription we decided to enter it directly into ELAN.
This page is maintained and updated by Armik Mirzayan.