The Generating Indonesian Paraphrased Sentences with Verbal Predicate Replacement
DOI:
https://doi.org/10.34818/INDOJC.2023.8.3.709Keywords:
Paraphrase, Predicate, Lexical Substitution, Semantic Similarity, word2vecAbstract
Sentence paraphrasing is restating sentences using different diction without changing the meaning of the language. Paraphrasing sentences can be done in several ways, including synonym substitution techniques, changing sentence forms, or replacing the predicate part of sentence. This research aims to produce a paraphrased sentence generator with semantic similarities to the original sentence. The paraphrasing used in this research is to identify the verb type predicate in simple sentences using PoS Tagging. Then look for words similar to the predicate using the similarity of the word2vec model. A list of opposites antonyms is used to improve the lexical substitution results. Evaluation is done by using human judgment between the results and the original sentence. The experimental results show that of the 600 sentence datasets, 48.37% of the sentences have semantic similarities, 20.93% have semantic reductions, and 30.70% have no semantic similarities.
Downloads
References
[2] R. Bhagat and E. Hovy, “What is a Paraphrase?†in Computational Linguistics, 2013, 39, 3, pp. 463-472.
[3] G. Hintz, “Data-driven Paraphrasing and Stylistic Harmonization†in Proceedings of NAACL-HLT, San Diego, California: Association for Computational Linguistics, 2016, pp. 37-44.
[4] Xu, W., Ritter, A., Dollan, W. B., Grishman, R., & Cherry, C. Paraphrasing for Style. Proceedings of COLING 2012: Technical Papers, pp. 2899–2914. Mumbai: ACL. 2012.
[5] Kaji, N., Okamoto, M., & Kurohashi, S. “Paraphrasing Predicates from Written Language to Spoken Language Using the Webâ€. Human Language Technology Conference of the North American Chapter HLT NAACL pp. 241-248. Boston: the Association for Computational Linguistics. 2004
[6] Barmawi, A. M., & Muhammad, A. Paraphrasing Method Based on Contextual Synonym Substitution. J. ICT Res. Appl., 257-282. 2019.
[7] H. Alwi, S. Dardjowidjojo, H. Lapoliwa, and A. M. Moeliono, Tata Bahasa Baku Bahasa Indonesia, Edisi Ketiga. Pusat Bahasa dan Balai Pustaka, Jakarta, 2010, p. 498.
[8] Y. Wibisono, “POS Tagger Bahasa Indonesia dengan Pythonâ€. [Online]. Available: https://yudiwbs.wordpress.com/2018/02/20/pos-tagger-bahasa-indonesia-dengan-pytho/ [accessed: 4 Oct 2021].
[9] Bunyamin, A. F. Huda, and A. Ardiyanti. Indonesian Stemmer for ambiguous word based on context. ICODSA 2021. p. 1-9.
[10] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. “Distributed Representations of Words and Phrases and their Compositionalityâ€, pp. 1–9, 2013.
Downloads
Published
How to Cite
Issue
Section
License
- Manuscript submitted to IndoJC has to be an original work of the author(s), contains no element of plagiarism, and has never been published or is not being considered for publication in other journals.Â
- Copyright on any article is retained by the author(s). Regarding copyright transfers please see below.
- Authors grant IndoJC a license to publish the article and identify itself as the original publisher.
- Authors grant IndoJC commercial rights to produce hardcopy volumes of the journal for sale to libraries and individuals.
- Authors grant any third party the right to use the article freely as long as its original authors and citation details are identified.
- The article and any associated published material is distributed under the Creative Commons Attribution 4.0License