Information Extraction of Scientific Papers with Learning Vector Quantization Algorithm F Sasmita
Universitas Pendidikan Indonesia
Abstract
Detection of each component of scientific paperwork is difficult for documents that have various formats. These problems can be overcome if using machine learning. The machine learning algorithm used in this study is LVQ. The LVQ algorithm is one part of the algorithm of Artificial Neural Networks. With this algorithm, each component in the scientific paperwork document will be studied first by the LVQ algorithm, then tested to extract the components in the scientific writing document. Based on the testing of 40 thesis scientific papers in 2011 to 2018, the average accuracy of the token-class test form was 78%. The acquisition of token-class accuracy is caused by the use of the weighting feature. Meanwhile, the accuracy with the form of class-token testing was obtained at 6%. The low accuracy of class-tokens is caused by the influence of LVQ algorithms, misspellings, and the appearance of irregular symbols.
Keywords: Information Extraction, Artificial Neural Networks, ANN, Learning Vector Quantization, LVQ, Scientific Paperwork Document, Thesis