Amazon Reviews Sentiment Analysis
Import the Libraries
requests allows you to send HTTP requests to a server that returns a Response Object with all the response data (i.e. HTML). BeautifulSoup (bs4) is used to pull data out of HTML files and convert the data to a BeautifulSoup object, which represents the HTML as a nested data structure. pandas is used for data analysis and manipulation. urllib can be used for many purposes, including reading website content, making HTTP and HTTPS requests, sending request headers, and retrieving response headers. csv module implements classes to read and write tabular data in CSV format.
Review the Web Page’s HTML Structure
We need to understand the structure and contents of the HTML tags within the web pages. We will be using the Amazon webpage and searching for Apple Watch Series 7 GPS + Cellular (shown below). You can find this webpage by selecting this link and scrolling down to the reviews section.
We can scrape this webpage by parsing the HTML of the page and extracting the information needed for our dataset
we will extract the following data elements from the reviews:
Reviewer Names Reviews
Retrieve HTML data and Extract the Data Elements
Create the Dictionary
We will create a dictionary that will contain the data names and values for the data elements that were extracted.
Create the Data Frame
We need to adjust for missing values in columns. This step will create the data frame as each key was a row and like this, the missing values are actually missing columns which is no problem for pandas (only missing rows lead to ValueError during creation). We need to transpose the data frame (flip the axis) and change the rows to columns. We also need to clean some of the data.
Sentiment Analysis
Sentiment Analysis is a use case of Natural Language Processing (NLP) and comes under the category of text classification. To put it simply, Sentiment Analysis involves classifying a text into various sentiments, such as positive or negative, Happy, Sad or Neutral, etc. Thus, the ultimate goal of sentiment analysis is to decipher the underlying mood, emotion, or sentiment of a text. This is also known as Opinion Mining
We can see the output is categorized between two — Polarity and Subjectivity.
Polarity is a float value within the range [-1.0 to 1.0] where 0 indicates neutral, +1 indicates a very positive sentiment and -1 represents a very negative sentiment.
Subjectivity is a float value within the range [0.0 to 1.0] where 0.0 is very objective and 1.0 is very subjective. Subjective sentence expresses some personal feelings, views, beliefs, opinions, allegations, desires, beliefs, suspicions, and speculations where as Objective sentences are factual.