Kenya AI hub to fund datasets for 50 African languages
Initiative supported by Google.org, FCDO, IDRC and Gates Foundation
#Kenya #LLMs - Kenya-based AI research organisation Masakhane African Languages Hub has launched a grant scheme to fund the development of high-quality datasets for 50 African languages. Supported by Google.org, UK Foreign, Commonwealth and Development Office (FCDO), International Development Research Centre (IDRC) and Gates Foundation, the programme invites African researchers, linguists, startups and community organisations to build inclusive local language datasets. The initiative calls for datasets across automatic speech recognition for 18 languages, AI benchmarking studies and culturally relevant multimodal datasets for 40 languages. Masakhane African Languages Hub aims to help empower one billion Africans with locally relevant AI tools by 2029.
SO WHAT? - The new grant programme addresses the severe underrepresentation of African languages in AI training data: a fundamental gap preventing African language speakers from accessing AI technologies in their own linguistic contexts. With over 2,000 languages spoken across Africa and none of the top 34 globally used internet languages being African, the initiative enables the creation of foundational datasets required for speech recognition, translation and educational tools. The initiative could help to unlock economic opportunities, whilst preserving linguistic heritage for communities currently excluded from digital technologies.
Here are some key points regarding the new grant programme:
Kilifi-based Masakhane African Languages Hub, an initiative anchored by Masakhane Research Foundation, has opened application for a new grant programme which will fund dataset development for 50 African languages with support from Google.org, UK Foreign, Commonwealth and Development Office (FCDO), International Development Research Centre (IDRC) and Gates Foundation,
The programme focuses on three core pillars:
Automatic Speech Recognition (ASR): Focused on large-scale, culturally grounded voice data for 18 African languages, emphasizing gender balance and contextual authenticity.
Benchmarking “In the Wild”: Supporting researchers to design studies that test how AI models actually perform in authentic, practical, and real-world African contexts.
Culturally Relevant Multimodal Datasets: Catalysing the creation of high-quality image, text, and speech datasets for 40 African languages to power the next generation of translation and education tools.
The hub hopes that the initiative represents a movement toward equitable digital futures, centring marginalized groups including women, rural communities and elderly populations whilst embodying Ubuntu principles in AI development.
Eligible applicants include organisations based in Africa or those with an established presence in Africa that are legally registered as: non-profit organisations, social enterprises, research institutions, consortia composed of any of the above. A social enterprise may be registered as a non-profit, for-profit, or hybrid entity, provided its primary mission is social, cultural, or environmental impact, rather than profit maximization.
Expressions of Interest deadline is 25 January 2026, with an applicant webinar scheduled for 14 January and full proposal submissions from shortlisted applicants due 25 February 2026 for eligible African-based organisations.
The initiative builds on a 2025 call for proposals that received 93 applications from 22 countries, with four grant awardees currently transitioning into the contracting phase following the selection process.
Masakhane African Languages Hub develops datasets, models and community-driven use cases to catalyse innovation in healthcare, education, agriculture and other sectors whilst addressing underrepresentation and misrepresentation of African languages in AI systems.
ZOOM OUT - Anchored by Masakhane Research Foundation, Masakhane African Languages Hub (meaning "we build together" in isiZulu) was established in July 2025 in recognition of the fact that African languages and perspectives are profoundly underrepresented in AI development. The initiative emerged from the Masakhane community's earlier work as an Indaba-focused African NLP research project, which revealed fundamental gaps in data, tools and research shaping modern AI systems. The Hub focuses on generating high-quality, open and culturally grounded language data through participatory community-led processes.
[Written and edited with the assistance of AI]
LINK
How to apply (Masakhane African Languages Hub)


