Digital Sanskrit philology tutorial
Wednesday, 18 December 2019
Given the enormous body of knowledge composed in the Sanskrit language over the past three millennia --- the largest body of literature in the world prior to the invention of the printing press --- it is imperative that the texts that convey this knowledge be brought into the digital medium. Making Sanskrit texts accessible in the digital medium will expose them to digital finding aids and to modern computational methods of knowledge discovery thereby facilitating the utility of this knowledge in current research and its application in society.
The tutorial requires only minimal familiarity with Sanskrit: some school-level instruction in India or a first-year university course abroad. The tutorial will instruct participants in how to create digital editions of Sanskrit texts that precisely document the use of characters, prose sentences, paragraphs, verses, and larger document divisions, that precisely indicate inflectional identification and morphological analysis, that precisely indicate the source of the text in a printed edition or elsewhere, that precisely document the language, script, and reference structure of the text, all in accordance with the most widely accepted guidelines for machine-readable access, the Text-Encoding Initiative Guidelines for XML markup.
In the tutorial participants will learn how to:
- encode Sanskrit texts precisely in a simple ASCII encoding scheme that covers all of Sanskrit including Vedic, the Sanskrit Library Phonetic basic encoding (SLP1).
- mark up the structure of the text and the document format in parallel in XML in accordance with the Text-Encoding Initiative (TEI) guidelines.
- validate the structure of XML and TEI documents against a document type definition (DTD) using XML validation tools.
- use regular expressions and replacement expressions to search and replace within a text document to facilitate markup.
- use meter analysis software to identify verses.
- create bibliographic information in accordance with the TEI guidelines.
- create a TEI document header that describes the editor, contents, source, structure, conventions, revision history, and other features of a TEI document.
- use the Sanskrit Library's TEITAgger software which utilizes the preceding technologies to semi-automatically create TEI documents from text files.
The sixteenth International Conference on Natural Language Processing (ICON-2019), pre-conference tutorial
Language Technologies Research Centre (LTRC), Seminar Room
International Institute of Information Technology
Professor C. R. Rao Road, Gachibowli
Hyderabad, Telangana 500032 INDIA
Download and install a good text editor:
- for MacOSX: https://www.barebones.com/products/bbedit/
- for Windows, Linux, or MacOSX: https://www.geany.org
Download and install an XML validator:
|10:00am||Introduction to SLP1, XML, TEI and their use in digital philology|
|11:00am||Using TEI for critical editing, and morphological analysis|
|11:30am||Practicum: Encode a text|
|1:30pm||Regular expressions introduction and practicum|
|2:30pm||Practicum: Using TEITAgger|
|4:00pm||Followup and Outlook: Text encoding and related computing projects|
- Sanskrit Library Phonetic ASCII encoding help page
- Linguistic Issues in Encoding Sanskrit, Appendix B
- Linguistic Issues in Encoding Sanskrit
- Peter M. Scharf, “TEITagger: Raising the standard for digital texts to facilitate interchange with linguistic software”
- Gérard Huet and Idir Lankri, “Preliminary Design of a Sanskrit Corpus Manager”
- Tanuja P. Ajotikar, Anuja P. Ajotikar, and Peter M. Scharf, “Enriching the digital edition of the Kāśikāvr̥tti by adding variants from the Nyāsa and Padamañjarī”
- Ajotikar, Tanuja P., Anuja P. Ajotikar, and Peter M. Scharf. 2018. “Enriching the digital edition of the Kāśikāvrtti by adding variants from the Nyāsa and Padamañjarī.” Computational Sanskrit and Digital Humanities: selected papers presented at the 17th World Sanskrit Confer- ence, University of British Columbia, Vancouver, 9–13 July 2018, ed. by Gérard P. Huet and Amba P. Kulkarni, pp. 207–18.
- Consortium, TEI, ed. 2007. TEI P5: Guidelines for electronic text encoding and interchange. Version 3.2.0. TEI Consortium. URL: http: //www.tei-c.org/Guidelines/.
- Huet, Gérard P. and Amba P. Kulkarni, eds. Computational Sanskrit and Digital Humanities: selected papers presented at the 17th World Sanskrit Confer- ence, University of British Columbia, Vancouver, 9–13 July 2018