Digital Sanskrit philology tutorial

Wednesday, 18 December 2019


Given the enormous body of knowledge composed in the Sanskrit language over the past three millennia --- the largest body of literature in the world prior to the invention of the printing press --- it is imperative that the texts that convey this knowledge be brought into the digital medium. Making Sanskrit texts accessible in the digital medium will expose them to digital finding aids and to modern computational methods of knowledge discovery thereby facilitating the utility of this knowledge in current research and its application in society.

The tutorial requires only minimal familiarity with Sanskrit: some school-level instruction in India or a first-year university course abroad. The tutorial will instruct participants in how to create digital editions of Sanskrit texts that precisely document the use of characters, prose sentences, paragraphs, verses, and larger document divisions, that precisely indicate inflectional identification and morphological analysis, that precisely indicate the source of the text in a printed edition or elsewhere, that precisely document the language, script, and reference structure of the text, all in accordance with the most widely accepted guidelines for machine-readable access, the Text-Encoding Initiative Guidelines for XML markup.

Specific skills

In the tutorial participants will learn how to:

  1. encode Sanskrit texts precisely in a simple ASCII encoding scheme that covers all of Sanskrit including Vedic, the Sanskrit Library Phonetic basic encoding (SLP1).
  2. mark up the structure of the text and the document format in parallel in XML in accordance with the Text-Encoding Initiative (TEI) guidelines.
  3. validate the structure of XML and TEI documents against a document type definition (DTD) using XML validation tools.
  4. use regular expressions and replacement expressions to search and replace within a text document to facilitate markup.
  5. use meter analysis software to identify verses.
  6. create bibliographic information in accordance with the TEI guidelines.
  7. create a TEI document header that describes the editor, contents, source, structure, conventions, revision history, and other features of a TEI document.
  8. use the Sanskrit Library's TEITAgger software which utilizes the preceding technologies to semi-automatically create TEI documents from text files.


  1. Peter M. Scharf, Fellow, Indian Institute of Advanced Study, Shimla
  2. Tanuja P. Ajotikar, Assistant Professor, Vyākaraṇa Vibhāga, Shree Somnath University, Veraval


The sixteenth International Conference on Natural Language Processing (ICON-2019), pre-conference tutorial

Language Technologies Research Centre (LTRC), Seminar Room

International Institute of Information Technology
Professor C. R. Rao Road, Gachibowli
Hyderabad, Telangana 500032 INDIA

Pre-tutorial preparation

Download and install a good text editor:

Download and install an XML validator:


10:00amIntroduction to SLP1, XML, TEI and their use in digital philology
11:00amUsing TEI for critical editing, and morphological analysis
11:30amPracticum: Encode a text
1:30pmRegular expressions introduction and practicum
2:00pmMetrical identification
2:30pmPracticum: Using TEITAgger
3:30pmGroup presentations
4:00pmFollowup and Outlook: Text encoding and related computing projects


Character encoding

Higher-level encoding

Metrical analysis


