Neural machine translation systems have recorded tremendous performance in data-intensive applications where performance declines when we have limited training sets. In this work, we benchmark the results of NMT between English and four pairs of African Bantu Low-resource Language (LRL) (Luganda, Swahili, Shona, Tsonga [LSST]).
We aimed to evaluate the current efficiency of NMT models on LRLs, especially Bantu languages. Being the most morphologically rich languages, but with OOV (Out of Vocabulary Problem), we implemented an NMT model using multi-head self-attention. The model worked along with pre-trained BPE and Multi-BPE embeddings to develop a state-of-the-art translation system for low-resourced morphologically rich Bantu languages, which have scarce translations online. We subjected our results to the BLEU and METEOR scores for our system performance evaluation.
Our experiments showed exemplary and first-ever LRL translation BLEU scores for Eng.-Tsonga, Eng.-Swahili, Eng.-Shona, and Eng.-Luganda as 62, 37, 22, and 20, respectively, while the corresponding METEOR scores for Eng.-Tsonga, Eng.-Swahili, Eng.-Shona, and Eng.-Luganda were 0.5, 0.3, 0.3, and 0.4, respectively.
If you have any questions about submitting your review, please email us at [email protected].