Automatic Vocal Completion for Indonesian Language Based on Recurrent Neural Network

Authors

  • Agi Prasetiadi Faculty of Informatics, Institut Teknologi Telkom Purwokerto
  • Asti Dwi Sripamuji Faculty of Informatics, Institut Teknologi Telkom Purwokerto
  • Risa Riski Amalia Faculty of Informatics, Institut Teknologi Telkom Purwokerto
  • Julian Saputra Faculty of Informatics, Institut Teknologi Telkom Purwokerto
  • Imada Ramadhanti Faculty of Informatics, Institut Teknologi Telkom Purwokerto

DOI:

https://doi.org/10.25299/itjrd.2024.14171

Keywords:

Slang, Recurrent Neural Network, Gated Recurrent Unit, Long Short-Term Memory, Bidirectional

Abstract

Most Indonesian social media users under the age of 25 use various words, which are now often referred to as slang, including abbreviations in communicating. Not only causes, but this variation also poses challenges for the natural language processing of Indonesian. The previous researchers tried to improve the Recurrent Neural Network to correct errors at the character level with an accuracy of 83.76%. This study aims to normalize abbreviated words in Indonesian into complete words using a Recurrent Neural Network in the form of Bidirected Long Short-Term Memory and Gated Recurrent Unit. The dataset is built with several weight confgurations from 3-Gram to 6-Gram consisting of words without vowels and complete words with vowels. Our model is the frst model in the world that tries to fnd incomplete Indonesian words, which eventually become fully lettered sentences with an accuracy of 97.44%.

Downloads

Download data is not yet available.

References

A. Lutfatuna, A. Novitasarib, and A. Helfyanac, “Bahasa alay pada chating di medsos remaja millenial (Bahasa Alay vs Remaja Millenial),” Pros. SENASBASA vol. Vol 2, no. 3, pp. 34–41, 2018, doi: 10.22219/.v2i2.2241.

L. O. M. S. Raditya, “Penggunaan bahasa gaul (Bahasa Alay) di twitter,” BASINDO J. Kaji. Bahasa, Sastra Indones. dan Pembelajarannya, vol. 5, no. 1, pp. 117–123, 2021.

D. Rani Gustiasari, “Pengaruh perkembangan zaman terhadap pergeseran tata Bahasa Indonesia; studi kasus pada pengguna Instagram tahun 2018,” J. Renaiss., vol. 3, no. 2, pp. 433, 2018, doi: 10.53878/jr.v3i2.86.

D. S. Maylawati, W. B. Zulfkar, C. Slamet, M. A. Ramdhani, and Y. A. Gerhana, “An improved of Stemming Algorithm for mining Indonesian text with slang on social media,” in 2018 6th International Conference on Cyber and IT Service Management (CITSM), Parapat, Indonesia, 2018, pp. 1-6, doi: 10.1109/CITSM.2018.8674054.

L. Wu, F. Morstatter, and H. Liu, “SlangSD: building, expanding and using a sentiment dictionary of slang words for shorttext sentiment classifcation,” Lang Resources & Evaluation, vol. 52, pp. 839–852, 2018, doi: 10.1007/s10579-018-9416-0.

A. R. Pal and D. Saha, “Detection of slang words in e-data using semi-supervised learning,” International Journal of Artifcial Intelligence & Applications, vol. 4, no. 5, pp. 49–61, 2013, doi: 10.5121/ijaia.2013.4504.

L. Saputra and L. Marlina, “An analysis of slang words used by Instagram account Plesbol,” English Language and Literature., vol. 8, no. 3, 2019, doi: 10.24036/ell.v8i3.105802.

K. Karlgren and R. Ramberg, “The use of design patterns in overcoming misunderstandings in collaborative interaction design,” CoDesign, vol. 8, no. 4, pp. 231–246, 2012, doi: 10.1080/15710882.2012.734829.

D. Scheer, C. Benighaus, L. Benighaus, O. Renn, S. Gold, B. Roder, and GF. B ¨ ol, “The distinction between risk and ¨ hazard: understanding and use in stakeholder communication,” Risk Analysis, vol. 34, no. 7, pp. 1270–1285, 2014, doi: 10.1111/risa.12169.

A. A. S. Gunawan, P. R. Mulyono, and W. Budiharto, “Indonesian question answering system for solving arithmetic word problems on intelligent humanoid robot,” Procedia Computer Science, vol. 135, pp. 719–726, 2018, doi: 10.1016/j.procs.2018.08.213.

D. Tang, F. Wei, B. Qin, N. Yang, T. Liu, and M. Zhou, “Sentiment embeddings with applications to sentiment analysis,” IEEE transactions on knowledge and data Engineering, vol. 28, no. 2, pp. 496–509, 2016, doi: 10.1109/TKDE.2015.2489653.

A. Anikin, A. Katyshev, M. Denisov, V. Smirnov, and D. Litovkin, “Using online update of distributional semantics models for decision-making support for concepts extraction in the domain ontology learning task,” in IOP Conference Series: Materials Science and Engineering, vol. 483, no. 1, 2019, doi: 10.1088/1757-899X/483/1/012073.

S. Thavareesan and S. Mahesan, “Sentiment lexicon expansion using Word2vec and fastText for sentiment prediction in Tamil texts,” in 2020 Moratuwa engineering research conference (MERCon), 2020, pp. 272–276, doi: 10.1109/MERCon50084.2020.9185369.

D. Zaky and A. Romadhony, “An LSTM-based spell checker for Indonesian text,” in 2019 international conference of advanced informatics: concepts, theory and applications (ICAICTA), 2019, pp. 1-6, doi: 10.1109/ICAICTA.2019.8904218.

P. Santoso, P. Yuliawati, R. Shalahuddin, and A. P. Wibawa, “Damerau Levenshtein distance for Indonesian spelling correction,” J. Inform., vol. 13, no. 2, pp. 11-15. 11, 2019, doi: 10.26555/jifo.v13i2.a15698.

W. Octoviani, M. Fachrurrozi, N. Yusliani, M. Febriady, and A. Firdaus, “English-Indonesian phrase translation using Recurrent Neural Network and adj technique,” in Journal of Physics: Conference Series, 2019, vol. 1196, no. 1, doi: 10.1088/1742-6596/1196/1/012007.

A. S. Kholimi and F. Nazihullah, “Identifkasi tulisan Arab dengan menggunakan GLCM dan RNN,” in Prosiding SENTRA (Seminar Teknologi dan Rekayasa), 2019, no. 4, pp. 39–43, doi: 10.22219/sentra.v0i4.2323.

Y. Wibisono and M. L. Khodra, “Pengenalan entitas bernama otomatis untuk Bahasa Indonesia dengan pendekatan pembelajaran mesin,” in Semin. Tah. Linguist. 2018, 2018, pp. 1–5.

K. Yoko, V. C. Mawardi, and J. Hendryli, “Sistem peringkas otomatis abstraktif dengan menggunakan Recurrent Neural Network,” Computatio: Journal of Computer Science and Information Systems, vol. 2, no. 1, p. 65, 2018, doi: 10.24912/computatio.v2i1.1481.

A. Chandra, “Indonesia news dataset,” Indonesia, 2020, [Online]. Available: https://github.com/andreaschandra/indonesiannews.

U. Singh, V. Goyal, and A. Rani, “Disambiguating hindi words using n-gram smoothing models,” International Journal of Engineering Sciences, vol. 10, pp. 26-29, 2014.

H. K. Poon, W. S. Yap, Y. K. Tee, W. K. Lee, and B. M. Goi, “Hierarchical Gated Recurrent Neural Network with adversarial and virtual adversarial training on text classification,” Neural Networks, vol. 119, pp. 299–312, 2019, doi: 10.1016/j.neunet.2019.08.017.

A. Kumar and R. Rastogi Nee Khemchandani, “Self-Attention enhanced Recurrent Neural Networks for sentence classification,” in 2018 IEEE Symposium Series on Computational Intelligence (SSCI), 2019, pp. 905–911, doi: 10.1109/SSCI.2018.8628865.

E. A. Nismi Mol and M. B. Santosh Kumar, “Study on impact of RNN, CNN and HAN in text classification,” in 2020 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA), 2020, pp. 94–102,doi: 10.1109/ACCTHPA49271.2020.9213231.

L. Wiranda and M. Sadikin, “Penerapan Long Short-Term Memory pada data time series untuk memprediksi penjualan produk Pt. Metiska Farma,” Jurnal Nasional Pendidikan Teknik Informatika: JANAPATI, vol. 8, no. 3, pp. 184–196, 2019, doi: 10.23887/janapati.v8i3.19139.

K. Moharm, M. Eltahan, and E. Elsaadany, “Wind speed forecast using LSTM and Bi-LSTM algorithms over gabal elzayt wind farm,” in 2020 International Conference on Smart Grids and Energy Systems (SGES), 2020, pp. 922–927, doi: 10.1109/SGES51519.2020.00169.

M. A. Istiake Sunny, M. M. S. Maswood, and A. G. Alharbi, “Deep Learning-Based stock price prediction using LSTM and Bi-Directional LSTM model,” in 2020 2nd novel intelligent and leading emerging sciences conference (NILES), 2020, pp. 87–92, doi: 10.1109/NILES50944.2020.9257950.

S. Li, Q. Wang, X. Liu, and J. Chen, “Low cost LSTM implementation based on stochastic computing for channel state information prediction,” in2018 IEEE Asia Pacifc Conference on Circuits and Systems (APCCAS), 2019, pp. 231–234, doi: 10.1109/APCCAS.2018.8605569.

H. Xue, D. Q. Huynh, and M. Reynolds, “SS-LSTM: A hierarchical LSTM model for pedestrian trajectory prediction,” in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), 2018, pp. 1186–1194, doi: 10.1109/WACV.2018.00135.

S. Dai, L. Li, and Z. Li, “Modeling vehicle interactions via modified LSTM models for trajectory prediction,” IEEE Access, vol. 7, pp. 38287–38296, 2019, doi: 10.1109/ACCESS.2019.2907000.

S. Chakraborty, J. Banik, S. Addhya, and D. Chatterjee, “Study of dependency on number of LSTM units for character based text generation models,” in 2020 International Conference on Computer Science, Engineering and Applications (ICCSEA), 2020, pp. 1-5, doi: 10.1109/ICCSEA49143.2020.9132839.

F. Miedema, “Sentiment analysis with Long Short-Term Memory networks,” Vrije Universiteit Amsterdam, vol. 1, pp. 1–17, 2018.

Y. Yu, X. Si, C. Hu, and J. Zhang, “A review of Recurrent Neural Networks: LSTM cells and network architectures,” Neural computation, vol. 31, no.7, pp. 1235-1270, 2019, doi: 10.1162/neco a 01199.

J. Patihullah and E. Winarko, “Hate speech detection for Indonesia tweets using word embedding and Gated Recurrent Unit,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 13, no. 1, pp. 43-52, 2019, doi: 10.22146/ijccs.40125.

C. Ronran and S. Lee, “Effect of Character and Word Features in Bidirectional LSTM-CRF for NER,” in 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), Busan, Korea (South), 2020, pp. 613-616, doi: 10.1109/BigComp48618.2020.00132.

D. Yolanda, K. Gunadi, and E. Setyati, “Pengenalan slfabet bahasa isyarat yangan secara real- time dengan menggunakan metode Convolutional Neural Network dan Recurrent Neural Network,” Jurnal Infra, vol. 8, no. 1, pp. 203–208, 2020.

S. Sun, Y. Zhang, P. Lin, W. Ren, and J. A. Farrell, “Distributed time-varying optimization with state-dependent gains: algorithms and experiments,” IEEE Transactions on Control Systems Technology, vol. 30, no. 1, pp. 416–425, 2021, doi: 10.1109/TCST.2021.3058845.

X. Liu, “Research on the forecast of coal price based on LSTM with improved Adam optimizer,” in Journal of physics: conference series, 2021, vol. 1941, no. 1, p. 012069, doi: 10.1088/1742-6596/1941/1/012069.

S. Jiang and Y. Chen, “Hand gesture recognition by using 3DCNN and LSTM with adam optimizer”, in Pacifc Rim Conference on Multimedia, 2017, pp. 743-753.

S. Bock, J. Goppold, and M. Weiß, “An improvement of the convergence proof of the ADAM-Optimizer,” arXiv preprint arXiv:1804.10587, 2018, doi: 10.48550/arXiv.1804.10587.

Downloads

Published

2024-07-18

How to Cite

Prasetiadi, A., Dwi Sripamuji, A., Riski Amalia, R., Saputra, J., & Ramadhanti, I. (2024). Automatic Vocal Completion for Indonesian Language Based on Recurrent Neural Network. IT Journal Research and Development, 9(1), 14–26. https://doi.org/10.25299/itjrd.2024.14171

Issue

Section

Articles