| Date |
Presenter |
Title & Abstract |
| Tue, Oct 19 |
Kevin M. Roddy
Department of Linguistics
|
Best Practices for the Recording, Processing, and Archiving of Endangered Languages Data Traditional scholarly publishing has been profoundly affected by technology. Print publishing is currently augmented, and will perhaps one day be replaced, by publishing research findings and data directly to the World Wide Web. The bibliographic science of description, access, and preservation is no longer the exclusive purview of librarians and curators - technology has empowered us to select our own metadata standards, and classify our data using to eXtensible Markup Language (XML), a rapidly emerging markup language that is changing how data is described and retrieved. Dozens of propriety software programs (WordPerfect, MS-Word, PowerPoint, Excel, Windows Media Player, to name a few) running on multiple operating systems (Linux, Macintosh, Windows, Lindows, HyperOS and others) producing a staggering number of different text, audio and visual files (.pdf, .doc, .ppt, .wpd, .xls, .gif, .tiff, .jpg, .wav, .aiff, .wma and many others) threaten to jeopardize the integrity and accessibility of new and archived data for present and future researchers. Moreover, steady computer hardware and software obsolescence will forever imprison archival data and render it useless (e.g., wire and wax recording media, 5-1/4 floppy disks, Betamax are some exampes) unless data are regularly migrated to currently readable formats, an expensive and time-consuming process. Field linguists working with endangered languages can now 1) record data in any format they wish; 2) assign metadata and XML tags to it so can be searched in local databases or on the Web, and 3) make all research findings, including raw and processed data, available to anyone with a Web browser, bypassing traditional scholarly publishing channels. To ensure that all of us can read the linguistic data we collect now and in the future, standard practices must be developed and followed. Electronic Metastructure for Endangered Languages Data (EMELD -http://www.emeld.org), a five-year project funded by the National Science Foundation, is a group of concerned field linguists, archivists, language engineers and librarians who are working to develop 'best practices' for the creation, description, storage, and dissemination of text, audio, and visual linguistic data to ensure that they may be easily accessed now and in the future. I attended EMELD’s “Workshop of Linguistic Databases and Best Practice” at Wayne State University in Detroit this past summer. My talk will review EMELD's goals and objectives, and suggest best practices that linguists can employ in the creation, description, processing and archiving of field data. I will use examples from Satawalese, a Trukic language spoken on Satawal Island, Yap State, Federated States of Micronesia, that I am currently documenting for my Master's thesis. |
Photo
![]() |
|
![]() |
![]() |
UH Manoa
Department. of
Linguistics Tuesday
Seminar Series Tuesday
Seminar Fall 2004