News Reader
I have scrapped the headlines of the news related to technology and science from various magazine, newspapers and websites.
to collect dataset of the recent technology news from different resources, and the applying some NLP techniques in order to analyze to most common trend of technology.
then we would do "Topic Modeling". Topic Modeling is an unsupervised learning approach to clustering documents, to discover topics based on their contents. It is very similar to how K-Means algorithm and Expectation-Maximization work.
Libraries
Run to view results
Data scraping
The data was acquired from many websites such as: inshort, linux magazine, BBC technology, google news AI canada, yahoo tech and google news computing canada.
The data is scrapped as html file, parsed using BeautifulSoup, and then the relevant information is extracted and stored in csv file.
Depending on the accessibility of each website, sometimes the urllib is used and sometimes requests,
i.e the request using urllib is forbidden in some website.
inshort
Run to view results
Run to view results
Run to view results
Linux magazine
Run to view results
Run to view results
Run to view results
Google AI news
Run to view results
Run to view results
Run to view results
Run to view results
BBC news
Run to view results
Run to view results
Run to view results
Run to view results
Yahoo tech news
you need to create an account in yahoo news to get access to the data
Run to view results
Run to view results
Run to view results
Run to view results
Run to view results
Google Tech news
Run to view results
Run to view results
Run to view results
Merging all the news data
Run to view results
Run to view results
Run to view results
Run to view results