Guwahati: In a significant advancement for regional language preservation, Assamese has been incorporated into IIT Bombay’s artificial intelligence platform BharatGen through a partnership between two Guwahati-based NGOs and the premier institute, officials announced Wednesday.
The collaboration will infuse two million pages of Assamese content into the national AI initiative.
The agreement signed Tuesday marks a historic milestone for Assamese, long categorized as a “low-resource” language in digital ecosystems.
“For the first time, Assamese achieves this scale of AI readiness through the inclusion of two million pages into BharatGen,” stated Narayan Sharma, secretary of Assam Jatiya Bidyalay Educational and Socio-Economic Trust, one of the partnering organizations.
This integration stems from ‘Digitising Assam’, a community-driven preservation project spearheaded by the Nanda Talukdar Foundation (NTF).
Over 40 months, volunteers digitised and preserved over two million pages encompassing Assamese books, journals, manuscripts and ancient Sachipats (traditional manuscripts).
The initiative stands as one of India’s largest citizen-led digital preservation efforts, according to the partners’ statement.
BharatGen, the government’s flagship AI program led by IIT Bombay, aims to develop a sovereign, indigenous large language model for India’s diverse linguistic landscape.
Its mission encompasses creating AI agents fluent in all 22 scheduled Indian languages, grounded in local cultural and linguistic contexts, with open-source accessibility.
ALSO READ: Actor Nandita Das joins star-studded jury for Busan Film Festival’s competition
Currently supporting nine Indian languages—Hindi, Marathi, Tamil, Malayalam, Bengali, Punjabi, Gujarati, Telugu and Kannada—Assamese becomes the tenth language integrated into the platform.
Launched in June 2025, BharatGen is being developed as a domestic alternative to global AI platforms like ChatGPT.
Led by IIT Bombay in collaboration with other IITs, IIITs and leading institutions, it represents the world’s first government-funded multi-modal large language model initiative.
The integration of Assamese content not only advances technological capabilities but also ensures the language’s digital preservation and accessibility for future generations in India’s evolving AI ecosystem.