Named Entity Recognition for an Indonesian Based Language Tweet using Multinomial Naive Bayes Classifier
DOI:
https://doi.org/10.34818/INDOJC.2019.4.2.330Abstract
In Natural Languange Processing (NLP), Named Entity Recognition (NER) is a sub discussion that is widely used for research. the main task of Named Entity Recognition (NER) is to help identify and detect the entity names from a word in a sentence. The data sources we use are a real time Indonesian language tweets that often occur, which the number of letter each tweet is limited to 280 characters. The words contained in that Indonesian language tweets can refer to the name of the entity, location, or organization, so to determine the name of that entity, it must be considered first by looking at the word patterns around it. In Indonesia, an average tweet posted from an account at least is 1-3 tweets per day which contain a formal and non-formal contents that made this a difficult challenge to provide the right entity naming. In this research, we are naming the entities from the Indonesian language tweets by using the Multinomial Naive Bayes Classifier algorithm. The system uses precision, recall,and f-measure as evaluation metrics. Naming this entity is able to classify with a value of f-1 reaching 80%.Downloads
References
Charu C Aggarwal and ChengXiang Zhai.Mining text data. Springer Science and Business Media, 2012.
Moch Arif Bijaksana, Siti Sa’adah, et al. Klasifikasi argumen semantik menggunakan kombinasi fitur named entities inconstituent, head word pos, dan syntactic frame.eProceedings of Engineering, 2(2), 2015.
Sigit A Dayinta W W Putra P A. Named entity recognition (ner) pada dokumen biologi menggunakan rule based dan naà Ìrvebayes classifier.Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, 2(11):4555–4563, 2018.
Devin Hoesen and Ayu Purwarianti. Investigating bi-lstm and crf with pos tag embedding for indonesian named entity tagger.In2018 International Conference on Asian Language Processing (IALP), pages 35–38. IEEE, 2018.
Iwan Kosasih. Peran media sosial facebook dan twitter dalam membangun komunikasi.Lembaran Masyarakat: JurnalPengembangan Masyarakat Islam, 2(1):29–42, 2016.
Nuning Kurniasih, S Sos, and M Hum. Penggunaan media sosial bagi humas di lembaga pemerintah. InForum KehumasanKota Tangerang, 2013.
Erick Alfons Lisangan. Implementasi n-gram technique dalam deteksi plagiarism pada tugas mahasiswa.TEMATIKA, Journalof Informatics and Information Systems, 1(2):24–30, 2013.
Ony Naraulita Maringga. Pemeriksaan penggunaan huruf kapital pada teks bahasa indonesia menggunaan metode rule based.2018.
Y Munarko, MS Sutrisno, WAI Mahardika, I Nuryasin, and Y Azhar. Named entity recognition model for indonesian tweetusing crf classifier. InIOP Conference Series: Materials Science and Engineering, volume 403, page 012067. IOP Publishing,2018.
Amelia Rahman, Wiranto Wiranto, and Afrizal Doewes. Online news classification using multinomial naive bayes.ITSMART:Jurnal Teknologi dan Informasi, 6(1):32–38, 2017.
Irina Rish et al. An empirical study of the naive bayes classifier. InIJCAI 2001 workshop on empirical methods in artificialintelligence, volume 3, pages 41–46. IBM New York, 2001.
Imanudin Shaufiah and Ibnu Asror. Android short messages filtering for bahasa using multinomial naive bayes. 2006.
Downloads
Published
How to Cite
Issue
Section
License
- Manuscript submitted to IndoJC has to be an original work of the author(s), contains no element of plagiarism, and has never been published or is not being considered for publication in other journals.Â
- Copyright on any article is retained by the author(s). Regarding copyright transfers please see below.
- Authors grant IndoJC a license to publish the article and identify itself as the original publisher.
- Authors grant IndoJC commercial rights to produce hardcopy volumes of the journal for sale to libraries and individuals.
- Authors grant any third party the right to use the article freely as long as its original authors and citation details are identified.
- The article and any associated published material is distributed under the Creative Commons Attribution 4.0License