Topic Classification of Publications
Identifying publication topics based on existing journals
More Info
expand_more
Abstract
Accurate topic classification is crucial in the scientific community when it comes to finding relevant journals. However, the efficiency and accuracy of topic classification of publications do not seem to be at its best performance, especially with the fast-paced rise in the quantity of research papers. Our research aims to address this problem by utilizing state-of-the-art (SOTA) methods. We chose the 'April 2022 Crossref' data set for the research, as Alexandria3k, the tool utilized for querying on the open data set, is tested on the same data. We stratified 50,000 data that have title, abstract, and work names, which are the roughly assigned topics. SOTA methods chosen for feature extraction and classification models are OpenAI Embeddings and XGBoost. Our research shows that this combination of SOTA methods has the potential to improve the performance of current topic classification of publications.