NATIVE ACCENT SENSITIVE VOICE CLONING USING PAIRWISE RANKING BASED DECODER MODELS, 122-129.

Chetan Madan; Harshita Diddee; Deepika Kumar; Shilpa Gupta; Shivani Jindal; Mansi Lal; Chiranjeev

doi:10.2316/J.2022.201-0224

NATIVE ACCENT SENSITIVE VOICE CLONING USING PAIRWISE RANKING BASED DECODER MODELS, 122-129.

Chetan Madan, Harshita Diddee, Deepika Kumar, Shilpa Gupta, Shivani Jindal, Mansi Lal, and Chiranjeev

Keywords

Audio, encoding, speech enhancement, voice analysis, accent classi-ﬁcation

Abstract

Voice cloning has become one of the most signiﬁcant applications of artiﬁcial intelligence (AI) infrastructures, owing to its common use in education, multimedia, and security domains. While extensive research has been carried out in the said domain, most existing systems do not achieve optimum voice naturalization, one probable reason of which can be attributed to the inability of systems to accurately map the user’s native dialect. Aiming to tackle this pitfall, this research proposes a generative decoder model which aims to clone the user’s voice while capturing the native accent and linguistic features of the speaker to provide a more naturalized synthesized output voice. The proposed methodology achieves an equal error rate of 0.051 for the speaker encoder module of the system, while achieving an accuracy of 98.12% on the accent adaptation module of the system.

Important Links:

References
DOI: 10.2316/J.2022.201-0224
From Journal (201) Mechatronic Systems and Control - 2022

Go Back