Klasifikasi Gender dan Usia berdasarkan Suara Pembicara Menggunakan Hidden Markov Model
DOI:
https://doi.org/10.34818/INDOJC.2019.4.3.375Abstract
Klasifikasi usia-genderberdasarkan suara sangat berguna dalam perkenalan pidato dan dalam pengenalan emosi. Klasifikasi genderjuga telah diterapkan dalam pengenalan wajah, peringkasan video, penentuan tingkat izin yang berbeda untuk kelompok umur yang berbeda, dan lainnya. Pengelompokan usia yang berbeda dibagi menjadi tiga kelompok: anak, muda, menengah, dan senior berdasarkan rentang usia tertentu. Penelitian ini berfokus pada klasifikasi usia-gender berdasarkan suara pembicara menggunakan gabungan Gaussian Mixture Modeldan Hidden Markov Model(GMM-HMM). Pertama, dilakukan pembangunan vektor ciri menggunakan Mel-Frequency Cepstrum Coefficient (MFCC). Selanjutnya, dilakukan pelatihan untuk menghasilkan model akustik untuk semua penutur (pria dan wanita dari berbagai usia) di dalam basisdata pelatihan. Terakhir, HMM diterapkan untuk mendeteksi genderdan kelompok usia. Pada penelitian ini, basisdata suara diambil dari situs Common Voice, yang berisi banyak posting blog, buku-buku lama, film, dan pidato publik lainnya. Hasil eksperimen menunjukkan bahwa model GMM-HMM yang telah dibangun mampu melakukan klasifikasi usia-genderdengan akurasi hingga 96,4%. Model ini dapat diperbaiki dengan pengaturan parameter secara lebih presisi dan penggunaan dataset yang lebih besar.
Kata Kunci: Klasifikasi, Mel-Frequency Cepstrum Coefficient, Acoustic Models, Gaussian Mixture Model, Hidden Markov Model
Downloads
References
D. B. D. B. Arafat Abu Mallouh. SA Framework for Enhancing Speaker Age and Gender Classification by Using a New Feature Set and Deep Neural Network Architectures. THE SCHOOL OF ENGINEERING UNIVERSITY OF BRIDGEPORT CONNECTICUT, 2017.
B. Barkana and J. Zhou. A New Pitch-Range Based Feature Set for a Speaker’s Age and Gender Classification. Applied Acoustics, 98, 2015.
A. a. Dr. Yusra Al-Irhayim. Speaker Gender Recognition Using Hidden Markov Model. College of Computer Science and Mathematics, University of Mosul, Mosul Iraq, 2016.
F. Faek. Objective Gender and Age Recognition from Speech Sentences. Aro, The Scientific Journal of Koya University, 3 (2), 24-29. doi:10.14500/aro.10072, 2015.
H. Fayek. Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What’s in-between. https://haythamfayek.com/2016/04/21/ speech-processing-for-machine-learning.html. Online; Accessed 2 November 2018.
J. M. Jirı Pribil, Anna Pribilova. GMM-Based Speaker Age And Gender Classification In Czech And Slovak. Journal of ELECTRICAL ENGINEERING, 68:3–12, 2017.
J. M. Jirı Pribil, Anna Pribilova. GMM-Based Speaker Gender And Age Classification After Voice Conversion. Journal of ELECTRICAL ENGINEERING, 2017.
D. Katerenchuk. Age Group Classification With Speech And Metadata Multimodality Fusion. CUNY Graduate Center 365 Fifth Avenue, Room 4319 New York, USA, 2018.
D.Z.J.Z. W.Z.LianzhangZhu, LeimingChen. Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of Svm and Dbn. Sensors 2017, 17, 1694; doi:10.3390/s17071694, 7, 2017.
Z. R. T. D. T. Lozano-Diez, A. and J. Gonzalez-Rodriguez. An Analysis of The Influence of Deep Neural Network (DNN) Topology in Bottleneck Feature Based Language Recognition. Plos One, 12( 8). doi:10.1371/journal.pone.0182580, 2017.
J. Lyons. Mel Frequency Cepstral Coefficient (MFCC) Tutorial. http://practicalcryptography.com/ miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/. Online; Accessed 15 Desember 2018.
M.E.M.Fairhurst and M.D.Costa-Abreu. Selective Review and Analysis of Aging Effectsin Biometric System Implementation. IEEE Transactions on Human-Machine Systems, 45, 2015.
H.v.H, M.H.Bahari, M.McLarenand, D. Van Leeuwen. Speaker Age Estimation Using I-Vectors, Engineering Applications of Artificial Intelligence. Sensors 2017, 17, 1694; doi:10.3390/s17071694, 34, 2014.
K. D. Michael Henretty, Tilman Kamp, and T. C. V. Team. 500 Hours of Speech Recordings, with Speaker Demographics. https://www.kaggle.com/mozillaorg/common-voice. Online; Accessed 3 November 2018.
T. S. K. D. K. A. Z. I. S. C. Nagendra Kumar Goel, Mousmita Sarma. Extracting Speaker’s Gender, Accent, Age and Emotional State from Speech. Go-Vivace Inc., McLean, VA, USA, 2018.
P. Saikia. Hmm-dnn speech recognition techniques: a review. Gauhati University-Institute of Distance and Open Learning, Assam, India, 7:14068–14072, 2017.
F. B. Vej. The Mel Frequency Scale and Coefficients. http://kom.aau.dk/group/04gr742/pdf/MFCC_ worksheet.pdf. Online; Accessed 24 January 2019.
Voice-Academy. Male and Female Voices. https://uiowa.edu/voice-academy/male-female-voices. Online; Accessed 1 September 2018.
Downloads
Additional Files
Published
How to Cite
Issue
Section
License
- Manuscript submitted to IndoJC has to be an original work of the author(s), contains no element of plagiarism, and has never been published or is not being considered for publication in other journals.Â
- Copyright on any article is retained by the author(s). Regarding copyright transfers please see below.
- Authors grant IndoJC a license to publish the article and identify itself as the original publisher.
- Authors grant IndoJC commercial rights to produce hardcopy volumes of the journal for sale to libraries and individuals.
- Authors grant any third party the right to use the article freely as long as its original authors and citation details are identified.
- The article and any associated published material is distributed under the Creative Commons Attribution 4.0License