To ''immortalise'' Tatar language: digital capabilities for inquisitive researcher

Historian and expert in Eastern studies Alfrid Bustanov’s column about digitalised Tatar literature

Realnoe Vremya's columnist Alfrid Bustanov works with Arabic sources a lot. In today's column written for our online newspaper, the researcher talks about online resources in which one can find ancient documents and examples of Tatar literature. In the researcher's opinion, open access to ancient texts will enable to conserve the mother tongue.

Digitalised Tatar literature?

Almost every day of my life has been linked with reading ancient texts written in Arabic script in the last 15 years. As a hungry student, I worked directly in archives. As time went by, I started to scan more and work with digital files. On the computer, one can zoom in the picture of a manuscript many times, distinguish the dot above the letter ''p'' from traces of beetles and cockroaches and travel with necessary books around the world, in general. The digital revolution completely changed our world. Even such bookworms like me don't stand on the sidelines of these fast changes.

Working with handwritten texts is an exciting travel that requires attention and experience (the more texts you read, the better it is). Nevertheless, many things become simpler in the digital world: databases and even full-text versions of classic literature substituted hundreds of rare dictionaries. Now a hadith that is present in the manuscript or corresponding poem from Jalāl ad-Dīn Muhammad Rūmī's Masnavi can be found on the Internet with several clicks.

Fragment from Galimdzhan Sharaf's History of the Golden Horde composition (1916)

What don't we have?

It all inspires and simplifies the work (that would take weeks of hard searches in the past). Nevertheless, there is one ''but''.

There is no electronic base of texts to work with texts in Tatar literature. Yes, if desired, one can find scans and Word texts of different quality printed in the 20 th century. But it refers to texts in Cyrillic only. The major part of the literature in the Tatar language created in 15-20th centuries in Arabic script just doesn't exist on the Internet. Fundamental work to digitalise and create algorithms that identify and classify manuscripts requires a lot of effort, money and time. In addition, now we are really in a situation when even the elementary creation of catalogues hasn't reached a satisfactory level (excluding the heroic work of colleagues on Persian collections). To find some information in thousands of handwritten texts, one still has to look through each of them and go through labyrinths of the unknown. Such a style of work in the electronic century is unseen luxury.

I'm sure any investments in digitalisation, e-publication and interpretation of big volumes of Tatar literature in Arabic script will pay off in spades. Moreover, we are talking about a reliable conservation of the richest written heritage, we have prospects for linguistic, historical and economic analysis, which are simply impossible with traditional philological approaches.

Biographic information about imams of Tetyushi and Chistopol uyezds

What's already available on the Internet?

After the sacral cry for what we don't have but would like to have, I'd like to make a small review of those tools that are available for those who work with Turkic literature in Arabic script. I think it will be useful for both young specialists and amateurs who are interested in the past of our country.

I need to say that the Academy of Sciences of the Republic of Tatarstan is already doing a big job on digitalising Arabic literature together with e-bases of dictionaries and other devices. The Academy of Sciences of the Republic of Tatarstan has recently made us a present and laid all existing products out online. Copies of some handwritten books are available on Manuscripta Islamica Rossica. Editions of the famous Tatar magazine Shura are available as a PDF on the page of the Russian National Library.

Dar al-Kutub project that has PDF copies of pre-revolutionary editions in the Tatar and Arabic languages does a lot in electronic publication of religious texts. Some manuscripts that don't have any annotations are also present here. In addition, we've recently launched the project Islamic Literature in Russia where we plan to post types texts and scans of written landmarks from state and private archives.

The site Shigriyat on which any person can find examples of Tatar poetry, including the pre-revolutionary period, seems to me an important initiative. But the texts are available only in Cyrillic script. But even so, one can identify the handwritten text in a fragment.

In my work I often turn to the rich Corpus of the Tatar language. There aren't texts in Arabic script as well, but one can see examples of usage of words from quite a wide selection of texts of mainly the 20th century. In my opinion, the main problem is that the base is based on journalists' articles and non-fiction that is available in order to be included in the Corpus. There are few religious texts in the Corpus.

I can advise a small technical site dedicated to transliteration from Old Orthography to Cyrillic and vice versa to those who study types of the Arabic script in the Tatar language (but it already stopped being available online). I could have missed or forgotten something. But there are not many such resources on the Internet.

These are, of course, separate important initiatives. But, in general, they can't be comparable with analogous bases of texts in the Arabic and Persian languages that are available online.

''This book was re-written by Mullah Gabdelmazhit in Mullah Gabdessalam's madrasah.''

What is this all needed?

There are some the most obvious areas of work that ask for it. Firstly, it's a history of concepts. People have been talking about some things that are important for them in everyday life for centuries. The discussion of key categories of culture gradually became a bunch of concepts that evaluated together with society. How did Mishars understand ''justice'' somewhere in Saratov Governorate? What was ''conscience'' and ''honour'' for Tatars in Astrakhan in the middle of the 19 th century? The discussion and evolution of such concepts can be seen in the big volume of data from different texts that have survived until now (letters, bills of sale, works on fiqh, historical chronicles).

Secondly, it's names and biographies. Tatar manuscripts have thousands of names of people, their biographies, feelings and acts. Some of them were included in special biographical dictionaries, while somebody was mentioned on the page 78b of one of the endless legal treaties. It all can be collected, classified and analysed by digitalising and identifying texts.

Geography is another obvious thing. We have a huge number of mentioned place names spread across the country at our disposal. If we distribute the texts we have on the map, we can think why some types of art developed in certain regions at one time or another.

In other words, the prospects are huge as well as the potential of growth of knowledge. Especially in the sense that now we don't know what questions our manuscripts can be asked tomorrow when they are already available for digital analysis. A possibility to open bottomless resources of libraries and open source archives is the most important thing, of course. We have too many texts and too few specialists to work in an old-fashioned way. The best way to ''immortalise'' a text (and the language) is to make it as available as possible to modern-day readers.

By Alfrid Bustanov