AI to speak to grandchildren in the voice of grandmothers soon
The Institute of the Tatar Encyclopedia has discussed digital Tatarstan, lack of funding and lack of food for artificial intelligence

The digital technologies cultural workers use in the republic were discussed this week at the Institute of Tatar Encyclopedia and Regional Studies. The data urgently needs to be transferred to an open license, because their main consumer is artificial intelligence. It depends on whether local content, not “hallucinations”, will be presented on the Internet in the future. For more information, see the Realnoe Vremya article.
The main Tatar encyclopedia and archaeological social network
The round table discussion was named “Digital technologies in culture: the experience of implementing projects in the field of digital humanities in the Republic of Tatarstan”. They started the conversation with their developments.
Lilia Davlet, a senior researcher at the Institute's Department of Electronic and Digital Resources, spoke about the Tatarica online encyclopedia, which has operated since the end of 2018. It is based on a multi-volume book encyclopedia. Materials for it were provided by our museums and libraries, in particular, hundreds of digital copies of photos. You can also send your own, especially interested in photos of villages, holidays, everyday life. At the same time, Davlet noted, these pictures do not go into the public domain.

There are currently 26,000 articles and 15,000 images on the site. They are read and watched by 4,000 visitors a day. Articles are updated and supplemented with sources. Those who have expired the term of protection of authorship, upload and also replenish the database with articles from the Soviet period. This is problematic with modern materials, as Davlet noted again, because they do not cooperate with the Tatar Book Publishing House:
“If we don't do something, then we don't have the right yet.”
Four people work on the encyclopedia, while the rest, including IT specialists, combine their work with other activities.
Another local development is the geographic information system “Cultural Heritage of Tatarstan and the Tatar people”.
“The system is close to the Quranic description of how God sees the universe," Ramis Mukhametshin, the head of the Information and Editorial Department at the Institute of Archaeology of the Academy of Sciences of the Republic of Tatarstan, began. “This is a multi-layered model in which any information can be obtained.
In fact, this is a kind of closed social network for archaeologists, it consists of 150 people, and the data appeared on the basis of projects. Having secured funding, the authors hope to add information about language, etymology, and place names to the Institute of History, making the system more accessible to the public.

Artificial intelligence lacks Tatar literature
Why do I need data in Tatar about Tatarstan? The most important thing now is for artificial intelligence, for machine learning.
To do this, said Airat Gatiatullin, a leading researcher at the Institute of Applied Semiotics of the Academy of Sciences of the Republic of Tatarstan, they are now combining data from the Tugan Tel National Tatar Language Corpus and the Turkic Morpheme language portal.

“There is such a thing as low-resource languages. Unfortunately, our language belongs to them," said Airat Gatiatullin. “To improve the quality of machine language for Tatar and other Turkic languages, we used the concept of affinity for Khakass and Altaic languages, where the resource problem is even more pronounced.”
But in large language models, GPT-3 and others like them, our languages and culture are poorly represented, Gatiyatullin pointed out. As a result, they hallucinate, coming up with fantasy answers to requests.
What do we do? For example, we can enrich these models with additional local knowledge bases, using so-called knowledge graphs, that is, detailed information about a topic with lists of links to other sites. These are the ones used by the Google search engine.

“We can present the resources of our institutions and combine them. To do this, we need to present them in a common format, which can be implemented using knowledge graphs," Gatiatullin explained. “Besides, our language is structurally very good for this approach. Today, the Institute of Semiotics cooperates with the Bauman Moscow State Technical University in this regard.”
Farhad Fatkullin, the vice president of the National League of Translators, clarified whether the institute plans to switch to open source. He also noted that all the presented sites urgently need to switch to an open license.
“First of all, we should focus on that your reader is an artificial intelligence,” Fatkullin pointed out. “If you have not given access to your rich data, the search engine will not scan it. And soon AI will speak to grandchildren in the voice of grandmothers.
Therefore, open licenses are needed. That's what the sites tatarstan.ru , kzn.ru , websites of the Congress of Tatars, Selet and others did. Tatarika also gets into AI.
“This allows all artificial intelligences to eat up all the content, because they were allowed to," Fatkullin said, calling these sites “heroes of the Tatar people.” “I suggest that everything that is created at the expense of the budget, especially Tatar, be uploaded with an open license. And it will work even if the institutes are closed and you stop working.”
Подписывайтесь на телеграм-канал, группу «ВКонтакте» и страницу в «Одноклассниках» «Реального времени». Ежедневные видео на Rutube, «Дзене» и Youtube.