Forget emoji, the real Unicode drama is over an endangered Indian script

Growing up, Vaishnavi Murthy would often visit her grandfather’s 500-year-old home in Mangaluru, a coastal city in southern India over 200 miles west from Bengaluru. Shelves had been carved into the house’s antique walls, which held dozens of palm-leaf manuscripts written in the Tulu script, also called Tigalari. Murthy’s grandfather was fluent in the ancient characters, and she vividly remembers struggling as a 5-year-old to read what he wrote in his notebooks — to the untrained eye, some of the letters resembled the spiral-shaped Indian dessert jalebi, or a pretzel knot. She was dazzled by the mysterious symbols and begged her grandfather to teach her about them.

Murthy’s grandfather passed away before he could share his knowledge, but her curiosity about the rare script endured. For almost two decades, the now 37-year-old typeface designer has spearheaded a winding and sometimes controversial effort to bring what she calls the Tulu-Tigalari script into the Unicode standard — the technical underpinning that allows any computer or smartphone to consistently display languages, from Chinese to emoji. “I didn’t know about all the complications of Unicode,” Murthy told Rest of World. “But, because it’s a script that is very close to my home, I am kind of driven to get it encoded.”

The Tulu language is spoken by relatively few people — approximately 1.8 million — and is not officially recognized by India’s government, although it’s been around for roughly 2000 years. Its speakers live in several Asian countries, but Tulu is most widely used in parts of the Indian state of Karnataka, nestled between the Western Ghats mountain range and the Arabian Sea.

In recent centuries, people have written Tulu using the script for Kannada — the region’s dominant language — rather than the complex set of characters that captured Murthy’s attention as a young girl. But generations ago, some scholars argue, the symbols her grandfather mastered were used to represent Tulu, as well as other languages like Sanskrit. The Tulu community has only recently started reviving this traditional script, which resembles a string of pearls, with many of its letters joining together like a chain.

The effort to digitize the Tulu script is a small slice of a much larger worldwide problem. Like many languages around the world, Tulu might soon disappear: UNESCO identifies it as one of 192 languages from India that are “in danger.” Globally, 40% of the over 7,000 languages spoken by humanity are at risk. In the last century, hundreds have gone extinct, taking with them stories, cultural traditions, ethnic identities, and a bounty of other information from the past. One way to preserve a language is to ensure it’s digitized, so that its speakers can continue expressing themselves as technology evolves.

But computers and smartphones were built for the Roman alphabet, making it hard to adapt them for languages with complex writing systems. The Unicode Consortium, the nonprofit that decides the Unicode standard, also requires that scripts be represented in a single, consistent form. Every character is assigned a specific number, or “character encoding,” that allows it to be recognized by computers. The requirement, while practical, forces speakers of non-Western tongues to find consensus on different grammatical forms, characters, and regional dialects — often losing some of the diversity of their language in the process. And coming to an agreement is not necessarily an easy task.

For the last decade, groups of linguists, typographers, and academics have worked on competing Unicode proposals that would finally bring Tulu’s ancient script to the digital age. The latest effort began last year, when the Karnataka Tulu Sahitya Academy, a governmental organization in Mangaluru that fights for the preservation of the language in India, submitted a new proposal that tries to capture how Tulu is spoken today. Earlier this year, Murthy and her technical partner, Vinodh Rajan, submitted a counter proposal that suggested a different path forward — they want to preserve the characters the way they appeared in old manuscripts as closely as possible.

There is a sharp divide between the two camps about what the future of Tulu should look like — a linguistic disagreement that, in some ways, captures the tediousness of the Unicode process. “India, with its multitude of languages and scripts, reflects a microcosm of issues involved generally in encoding today,” said Deborah Anderson, the chair of Unicode’s Script Ad Hoc advisory group and a linguistics researcher at the University of California, Berkeley.

But the debate over Tulu also concerns a universal question about all languages: How much should they evolve, and how much should they reflect history?

By the late 1980s, a number of tech companies, including Apple and IBM, had already developed their own character encoding systems that weren’t compatible with one another. A few years later, a group of American software engineers started the Unicode Consortium, which sought to create a single standard that would work across every device. Anyone can submit a proposal to make changes to Unicode, which are voted on by the consortium’s members, a hodgepodge of tech companies like Facebook, Apple, and Google, as well as local and national governments, including from Oman, Bangladesh, and the Indian state of Tamil Nadu. Joining the organization costs up to $21,000 a year.

When it first started, Unicode borrowed the character encodings for some Indo-Aryan languages from the Indian government’s own encoding standard system, the Indian Script Code for Information Interchange. Gujarati, Bengali, Telugu, and Tamil have all been included in Unicode through ISCII over the years, and more recently, a number of rare scripts, such as Gunjala Gondi and Dhives Akuru, have also been added.

The problem is that the ISCII introduced a number of approximations and also didn’t include many older characters that aren’t used in the present day. “The Indian government only considered the modern script and not the rare manuscripts, characters, and forms,” said Rajan, who works as a software consultant in Germany.

Murthy started her work with the Tulu script around 18 years ago, when she was a graphic design student. She spent hours trying to understand the historic writing style and learning how to digitize its characters. Eventually, she sought the help of a local type foundry, which offered to get her work added to Unicode. The very first preliminary proposal for Tulu-Tigalari was submitted in 2011, which Murthy called “not comprehensive.” Unicode solicited her feedback on it, and she found some of the characters were inaccurate, forcing her to start work on digitizing the script correctly.

Murthy spent the following five years practicing reading the Tulu-Tigalari script, finding new character representations, and figuring out how the symbols worked with one another. “The traditional Tulu-Tigalari script is very complicated — it’s about the subtle behaviour that we need to capture in the Unicode proposal and how to represent it,” said Rajan. “When two characters combine in Indian scripts, they can take multiple forms, and in case of Tulu-Tigalari, it can go in three or four different ways, making the process to encode it a challenging one.”

Murthy finally sent a new preliminary proposal to Unicode in 2016. Rajan joined the effort a year later to provide technical advice, and since then, he and Murthy have been going back and forth with representatives from the consortium, answering questions and providing clarifications. Murthy said they meet regularly. “There are people in the Unicode committee who are familiar with Indian scripts, and it has been very nice to have their inputs,” said Murthy. “But maybe there’s an easier way to go about this for researchers in the future, where there is an organized way and a dedicated committee which will help the people who want to get their scripts encoded.”

As Murthy was falling in love with the Tulu script, it also began piquing the interest of U.B. Pavanaja, a 62-year-old computing researcher in Bengaluru who has spent most of his life working to bring Indian languages online. In 2006, he paid a visit to the president of the Karnataka Tulu Sahitya Academy, hoping to introduce researchers there to the idea of including Tulu in Unicode. “But those days, I was too ahead of their time, and nobody understood what I was talking about,” recalled Pavanaja, who worked on the academy’s 2020 Unicode proposal.

Pavanaja has spent decades dabbling with different kinds of software for Indo-Aryan languages and worked with companies such as Tally and Microsoft; his professional resume is pages long. While he didn’t speak the language at home, he eventually got involved with the Tulu Wikipedia initiative, which has published Tulu articles written in the Kannada script for almost five years.

In 2014, he was in Mangaluru again, where he gave a presentation about Tulu Wikipedia and Unicode at the World Tulu conference, a regular event celebrating Tulu language and culture. “That’s when [the academy] realized that if we want to do anything futuristic, our languages have to be implemented in technology,” said Pavanaja.

Three years later, the academy formed a committee of scholars to create a Tulu Unicode proposal, which ended up taking an unconventional approach. While most proposals focus on creating character lists based on the original script for a language, Pavanaja said the team decided to also add several new characters that had never been used before. The old Tulu script found in ancient palm-leaf inscriptions, he explained, doesn’t have letters for all the sounds that are used in the spoken language today. “That is how some of the extra characters were designed and added,” said Pavanaja. “The Tulu language needed them, because while speaking, we wanted them.” Last month, the Karnataka state government approved the academy’s character list as the official one.

The academy made several other changes it believed would help transform Tulu into a written language for the modern digital age. First, it based Tulu’s grammar and structure on Kannada, which almost all Tulu speakers can already read. The committee also changed Tulu’s orthography — the norms around things like spelling and punctuation — from the Malayalam style to the Kannada style. “Teaching the Tulu people the Kannada orthography will be easier because they already know it,” Pavanaja said. Last year, he completed the proposal and sent it off to Unicode.

“Making an oral language into a written language is a big deal. It’s very delicate the way they exist.”

The academy’s proposal enraged Murthy, who believes that its approach will leave behind all the old palm-leaf manuscripts written in the Tulu-Tigalari script that her grandfather cherished. Her real desire is to get those ancient characters onto computers — not necessarily to create a practical written language for today’s Tulu speakers. In fact, she worries that the push to translate Tulu’s auditory abundance — filled with different dialects, practices, connections, and structures — into a written form will kill its essence. “Making an oral language into a written language is a big deal,” she said. “It’s very delicate the way they exist.”

Earlier this year, in April, Murthy and a group of independent academics sent a request to Unicode asking the organization to postpone encoding the academy’s proposal. They argued that it had taken too many liberties and deviated far away from the original Tulu-Tigalari manuscripts and inscriptions. Most importantly, they contended that the script was never really used to write the Tulu language in the first place. In fact, Murthy’s grandfather used the Tulu-Tigalari script to write the Sanskrit language.

“The Tulu-Tigalari script was used exclusively to record the Sanskrit language. The Academy is trying to introduce this script to write the Tulu language,” Murthy and the academics wrote in their request. Sanskrit is one of the oldest languages in the world, and several scripts have been used to write it over the years. In the end, Murthy and the scholars suggested a compromise, pointing out that the academy could have added the missing characters to the original orthography and didn’t need to modify the script as much as it did.

Karthik Malli, an independent linguistics researcher who has studied a number of Indo-Aryan languages, including Tulu, said that script reform is normal, but it’s the intention and process behind it that’s important. “Are they completely reinventing it? Are they making it easier to read? Script reform in itself is not a bad thing or a wrong thing — it happens all the time,” he said. “The question is whether it is being done faithfully.”

In this case, the answer depends on your interpretation of history.

The disagreement between Murthy and Pavanaja dates back to the 1800s, when Christian missionaries from Europe came to Mangaluru and began trying to translate the Bible and other church documents into local languages.

“The German missionaries were trying to figure out the names of several scripts,” Murthy explained, but they ended up conflating different languages with one another. In one case, they identified a script called Tigalari and erroneously thought it was only for writing Tulu. After sifting through hundreds of Tulu-Tigalari manuscripts, Murthy said she’s confident that it was mostly used to write Sanskrit.

The Karnataka Tulu Sahitya Academy, however, disagrees with this historical narrative. Its proposal argues that the Tulu script was used to write thousands of documents in Tulu until the beginning of the 19th century, including all of the literature by the famous philosopher Madhvacharya. But as printing technology took over, the Tulu script faded into oblivion. Missionaries began using the Kannada script to print the Tulu language as well.

In the end, Pavanaja and Murthy’s different versions of Tulu may both find their way into the Unicode standard. Anderson, from Unicode’s script advisory group, said that when there are two competing proposals for the same script, it’s almost always best for the authors to collaborate with one another. But in this case, the Tulu proposals are targeting different sets of users with different needs: the academy wants to support everyday speakers, while Murthy’s classical script will probably be used most by linguists and other academics.

As a result, the advisory group allowed Murthy’s competing proposal to proceed on its own. Both her submission and the academy’s are still being reviewed by Unicode and will eventually be voted on by the organization’s members. If they’re accepted, the process of creating fonts and keyboards for Tulu can begin. “It can then take several more years for a script to appear on mobile devices,” said Anderson.

As the brutally slow process continues unfolding, Tulu speakers remain caught in the middle. “The sad part is that we have a lot of scripts that very few people read, and those are on Unicode … But for some reason, Tulu has been frozen in this limbo for a decade,” said Malli, the linguistics researcher. “Even if not many people can read it right now, it deserves to be something you can have on your phone.”

Creators & Communities

Forget emoji, the real Unicode drama is over an endangered Indian script

For 10 years, experts have tried to bring Tulu to smartphones. But they can’t agree on how.