
Saheed Azeez, a final-year Mechanical Engineering student at the University of Lagos (UNILAG), has made significant strides in artificial intelligence by developing YarnGPT, a text-to-speech AI model that reads text aloud in a Nigerian accent. This innovation addresses the scarcity of AI models tailored to Nigerian languages and accents.
Prior to YarnGPT, Azeez created Naijaweb, a dataset comprising 230 million GPT-2 tokens sourced from Nairaland, one of Nigeria’s largest online forums. This dataset serves as a foundation for training large language models (LLMs) relevant to Nigerian contexts.
Developing YarnGPT posed challenges, particularly in gathering high-quality audio data representative of Nigerian speech patterns. Azeez overcame these obstacles by combining audio from Nigerian movies with datasets from Hugging Face, an open-source machine learning platform. Despite limited resources, he successfully trained the model using cloud computing services.
Azeez’s work has garnered attention for its potential applications, including enhancing accessibility for non-English speakers and providing localized voice-overs for content creators. His achievements underscore the innovative potential within Nigeria’s academic community.
In a recent interview, Azeez discussed his innovation and how he managed to balance his academic responsibilities with his AI projects.