Student's AI Project Aims to Preserve Tulu Language in Digital Era
AI Project Helps Preserve Tulu Language Digitally

Student Pioneers AI Solution to Boost Tulu Language in Digital Space

In a significant move to preserve linguistic heritage, Prathamesh Devadiga, an eighth-semester BTech (Computer Science and Engineering) student at PES University in Bengaluru, is spearheading a project that enables large language models (LLMs) to generate Tulu. This initiative comes at a crucial time when efforts are intensifying to secure recognition for Tulu as the second official language of Karnataka.

Addressing Digital Marginalization of Low-Resource Languages

Prathamesh explained to TOI that his research, detailed in the paper 'Making Large Language Models Speak Tulu: Structured Prompting for an Extremely Low-Resource Language' published in Lossfunk Letters, tackles the uneven distribution of AI benefits. "While LLMs are transforming global communication, digitally rich languages dominate, leaving low-resource languages like Tulu on the margins," he stated. Tulu, a Dravidian language spoken by nearly 2 million people, suffers from a very limited digital presence, often causing mainstream models to default to Kannada instead of generating authentic Tulu.

Innovative Structured Prompting Technique

The project employs a structured five-layer prompt built around 2,800 tokens, designed to guide models such as GPT, Gemini, and Llama to produce grammatically correct Tulu without any fine-tuning. Prathamesh highlighted that vocabulary contamination was dramatically reduced from 80% to just 5%, while grammatical accuracy soared to 85%. Cross-model analysis revealed that negative constraints improved performance by 12 to 18 percentage points, and grammar documentation contributed gains of 8 to 22 percentage points, depending on the model architecture.

Overcoming Linguistic Challenges

He noted that the overlap in script and vocabulary between Tulu and Kannada, combined with Kannada's stronger online presence, often leads models to respond in Kannada even when instructed to use Tulu. "This structured approach is a practical step toward language preservation in the AI era," Prathamesh emphasized, adding that he is now collaborating with developers to advance the project further.

This work not only enhances Tulu's digital footprint but also sets a precedent for preserving other low-resource languages globally, showcasing how AI can bridge linguistic gaps and foster cultural sustainability.