Classification Model of Consumer Question about Motorbike Problems by Using Naïve Bayes and Support Vector Machine
DOI:
https://doi.org/10.34818/INDOJC.2021.6.2.561Keywords:
classification, naïve bayes, SVM, n-gram, TF-IDFAbstract
The motorbike plays an important role in supporting daily activity. The motorbike is known as one of the transportation modes that is frequently used in Indonesia. The number of motorbikes used in Indonesia is continuously increasing time by time. Hence, the occurrence of motorbike problems can affect community activity and disturb the economic condition in society. Since the problem of the motorbike can occur at any time, a prevention action is required by providing an online consultation platform. However, a classification model is required to handle a wide range of questions about the motorbike problem. By classifying those questions into a specific class of problems, the solution can be delivered to the consumer faster. In this study, we developed prediction models to classify consumer questions. The data set was collected from consumer questions regarding motorbike problems that are commonly occurring. The model was developed using two machine learning algorithms, i.e., Naïve Bayes and Support Vector Machine (SVM). Text vectorization was performed by using the n-gram and term frequency-inverse document frequency (TF-IDF) method. The results show that the SVM model with the uni-trigram model performs better with the value of accuracy and F-measure, which are 0.910 and 0.910, respectively.
Downloads
References
[2] Lokadata, “Kecelakaan Lalu Lintas Menurut Jenis Kendaraan,†2020. [Online]. Available: https://lokadata.id/data/kecelakaan-lalu-lintas-menurut-jenis-kendaraan-2020-1582708742.
[3] M. Baygin, "Classification of text documents based on naive bayes using N-gram features," International Conference on Artificial Intelligence and Data Processing (IDAP), Malatya, Turkey, 2018, pp. 1-5.
[4] Venkatesh and K. V. Ranjitha, "Classification and optimization scheme for text data using machine learning naïve bayes classifier," IEEE World Symposium on Communication Engineering (WSCE), Singapore, 2018, pp. 33-36.
[5] D. Bužić and J. Dobša, "Lyrics classification using naive bayes," 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, 2018, pp. 1011-1015.
[6] M. A. Rahman and Y. A. Akter, "Topic classification from text using decision tree, K-NN and multinomial naïve bayes," 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), Dhaka, Bangladesh, 2019, pp. 1-4.
[7] G. Singh, B. Kumar, L.Gaur, and A.Tyagi, “Comparison between multinomial and bernoulli naïve bayes for text classification,†International Conference on Automation, Computational and Technology Management (ICACTM), India, 2019.
[8] A. Nugroho, “Analisis sentimen pada media sosial twitter menggunakan naive bayes classifier dengan ekstrasi fitur N-gram,†J-SAKTI, vol. 2, no. 2, 2018, p. 200.
[9] F. Peng and D. Schuurmans, “Combining naive bayes and N-gram language models for text classification,†Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 2633, 2003, pp. 335–350.
[10] L. Kobyliński and A. Przepiórkowski, “Definition extraction with balanced random forests,†Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 5221, 2008, pp. 237–24.
[11] M. Hakiem and M. A. Fauzi, “Klasifikasi ujaran kebencian pada twitter menggunakan metode naïve bayes berbasis N-gram dengan seleksi fitur information gain,†vol. 3, no. 3, 2019, pp. 2443–2451.
[12] A. Tripathy, A. Agrawal, and S. K. Rath, “Classification of sentiment reviews using n-gram machine learning approach,†Expert Syst. Appl., vol. 57, 2016, pp. 117–126.
[13] M. Shirakawa, T. Hara, and S. Nishio, “N-gram IDF: A global term weighting scheme based on information distance,†WWW 2015 - Proc. 24th Int. Conf. World Wide Web, 2015, pp. 960–970.
[14] Suyanto, “Data Mining: Untuk klasifikasi dan klasterisasi data,†Informatika, 2017, pp. 196-210.
[15] P. A. Octaviani, Y. Wilandari, and D. Ispriyanti, “Penerapan metode klasifikasi support vector machine pada data akreditasi sekolah dasar di kabupaten magelang,†Jurnal Gaussian, vol. 3, no. 4, 2014, pp. 811-820.
[16] X. Zhou and A. Del Valle, "Range based confusion matrix for imbalanced time series classification," 6th Conference on Data Science and Machine Learning Applications (CDMA), Riyadh, Saudi Arabia, 2020, pp. 1-6.
Downloads
Published
How to Cite
Issue
Section
License
- Manuscript submitted to IndoJC has to be an original work of the author(s), contains no element of plagiarism, and has never been published or is not being considered for publication in other journals.Â
- Copyright on any article is retained by the author(s). Regarding copyright transfers please see below.
- Authors grant IndoJC a license to publish the article and identify itself as the original publisher.
- Authors grant IndoJC commercial rights to produce hardcopy volumes of the journal for sale to libraries and individuals.
- Authors grant any third party the right to use the article freely as long as its original authors and citation details are identified.
- The article and any associated published material is distributed under the Creative Commons Attribution 4.0License