Scientific Articles Categorization Project
This is a beginer's project for Data Science and ML course.
The project deals with natural language processing - assigning category of scientific papers according to their abstracts. From various models, Linear Support Vector Machine model with hyperparameter optimization with 89% accuracy was trained and exported for further usage.
The dataset consists of 46 985 abstracts of published scientific papers with assigned labels of 7 categories:
- Computer Science
- Electrical Engineering
- Mechanical Engineering
- Civil Engineering
- Medical Science
Kowsari, Kamran; Brown, Donald; Heidarysafa, Mojtaba ; Jafari Meimandi, Kiana ; Gerber, Matthew; Barnes, Laura (2018), “Web of Science Dataset”, Mendeley Data, v6 http://dx.doi.org/10.17632/9rw3vkcfy4.6
How to see this project
Overall pipeline can be found in in Showcase_modelling.ipynb notebook in SHOWCASE folder.
How to categorize your own dataset with abstracts of scientific papers
For predicting category of a scientific articles, download PACKAGE folder and follow the instructions in README_package.md file. The data consisting of at least 1 column with abstract texts in .xlsx format is needed.
For viewing sample categories predictions, see the notebook Showcase_predicting.ypinb in SHOWCASE folder.