‘Alisa can start speaking Tatar, but this is not enough’

Tatsoft machine translation service is preparing an application for iOS

‘Alisa can start speaking Tatar, but this is not enough’
Photo: предоставлено пресс-службой театра Камала

The developers of Tatsoft machine translation service are ready to share their data with other specialists — all for the sake of developing the Tatar language. Besides, it is planned that other languages of the republic to appear in Tatsoft. At the end of May, the Institute of Applied Semiotics promises that the service in the form of an application can be installed on mobile devices running iOS. By the way, Tatsoft says, the voices of the actors of the Kamal Theatre are used — by Alsu Kayumova (pictured) and Almaz Garayev.

Thousands of voices and more than three hundred hours of recording

Preparations for the introduction of a new function for recognising Tatar speech have been underway for about a year. To do this, different voices were needed so that the system could eventually understand any words spoken orally. All this time, the staff of the Institute of Applied Semiotics of the Academy of Sciences of the Republic of Tatarstan collected voice material. Specially for this, a telegram bot was launched, where people dropped voice messages recorded in a variety of conditions and on a variety of gadgets, including interference, with some external noise. As a result, the database includes more than a thousand variants of voices, which is more than 300 hours of recording.

The same function is now available through the Tatsoft machine translation service application. It has been available for download on the online market for a month now. In a month, the ability to download this application is going to appear on mobile devices running iOS. This was announced by Rinat Gilmullin, the director of the Institute of Applied Semiotics of the Academy of Sciences of the Republic of Tatarstan.

“Now anyone can install the app on Android. Now it's up to Apple, for our part, we've done all the manipulations. For the application to be downloaded to the iPhone, it is necessary to carry out certain technical work, confirm our account and publish the application," said the head of the institute.

The development of the online service cost 7 million rubles. Funding came from the Academy of Sciences of the Republic of Tatarstan and from the Commission under the rais of the Republic of Tatarstan on the preservation and development of the Tatar language and native languages of representatives of peoples living in the Republic of Tatarstan.

According to leading researcher Bulat Khakimov, this is the first service that recognises Tatar speech, there is no such thing on the Internet anymore. On the site, you can now not only write or paste text and receive a translation, but also use voice messages and receive a translation in the same text.

“To make this service work, we have collected a large amount of voice material. Datasets are a database of recorded speech samples. We worked in two directions. Speech synthesis from text to spoken speech and recognition of spoken speech and its transformation into written text. In the first case, we used the voices of professional announcers, in particular actors of the Galiasgar Kamal Theatre, Alsu Vazieva and Almaz Garayev. In order for the service to work in the opposite direction, we needed the voices of ordinary people and in large numbers," the scientist said.

One of the official voices — actor of the Kamalovsky Theatre Almaz Garaev. предоставлено пресс-службой театра Камала

“We are ready to share”

The Institute of Applied Semiotics does not plan to limit itself only to its website, it is ready to cooperate, in particular with Yandex. Joint activities began in 2016, Rinat Gilmullin said:

“When we started working with Yandex, the network was just starting to use neural networks. But already at that time we were faced with the task of developing a machine translator, and since 2016 we have been transferring datasets to Yandex, thanks to them it became possible to translate into Tatar. When technology began to develop, the situation changed, such large networks are not very interested in regional languages. But we wanted to develop further, so we couldn't wait and decided to act on our own.

Nevertheless, according to the director of the institute, Tatarstan residents are ready to resume cooperation with Yandex.

“We are ready to exchange data with pleasure to further spread the Tatar language. Negotiations are underway on this issue. They have their own needs to improve their translator with the help of our datasets. It is important for us to implement our datasets so that, for example, Alisa speaks Tatar. But this requires systematic work. Certainly, Alisa can start speak Tatar, but this is not enough. Communication with the entire network should be provided. Simply with music and other content. If the system is not adjusted, there will be no point in such column," Gilmullin believes.

“The Tatar language should be more integrated into global networks”

According to Bulat Khakimov, it is now important to change the attitude towards the Tatar language:

“Some kind of stereotypical expectation has been formed from technologies, services and their accessibility. Together with young researchers, we studied the statistics of search queries in the Tatar language and did their thematic modelling. It turns out that the topics related to art and songs prevail, but all the utilitarian, pragmatic, vital requests of people are completely in Russian. That is, the image of the language has already been formed — what can be done in this language in the same digital system and what is not.”

To change this attitude, we need to add working services, the scientist believes.

“People should see that they can do a lot of things online in the Tatar language. If it is possible to make applications convenient, then people will learn to use them little by little, the more the Tatar language is integrated into global networks, the more popular it will be.

Tatsoft's online machine translation service is now available in two languages. There are going to be others in the future:

“There is a discussion of proposals for adding other languages of the peoples of the Republic of Tatarstan to the service. This work is planned to be done by the end of this year," the director of the Institute assured.

Milyausha Kashafutdinova