Libraries Used
Installing third party libraries
Requirement already satisfied: openpyxl in /root/venv/lib/python3.9/site-packages (3.1.2)
Requirement already satisfied: et-xmlfile in /root/venv/lib/python3.9/site-packages (from openpyxl) (1.1.0)
WARNING: You are using pip version 22.0.4; however, version 23.1.2 is available.
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.
Collecting tweet-preprocessor
Downloading tweet_preprocessor-0.6.0-py3-none-any.whl (27 kB)
Installing collected packages: tweet-preprocessor
Successfully installed tweet-preprocessor-0.6.0
WARNING: You are using pip version 22.0.4; however, version 23.1.2 is available.
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.
Specific corpuses downloaded from NLTK
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
Importing Tweet File
Importing Tweets
Tweets were extracted using RapidMiner. Extracted tweets were exported as Excel file.
/root/venv/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py:226: UserWarning: Workbook contains no default style, apply openpyxl's default
warn("Workbook contains no default style, apply openpyxl's default")
Removing unwanted columns
Keeping only the text and retweets columns, the rest were removed
Text Pre-processing
Initialize tokenizers
NLTK has a dedicated tokenizer for Tweets which was used in this project. Inflection of words in Tweets were lemmatized using Lemmatizer
Custom function for cleaning text
Cleaning common tokens in a Tweet such as RT, @ using a library called tweet-preprocessor.
Removing punctuations and digits
Lemmatizing words
0
SAF produced in India helps power AirAsia flight https://t.co/fUWPznKdyK
0
1
Hello @airasia refund tic yg beli dkt ur website from jogja to jakarta still not responding...then simply said case close..d flight @Citilink hell not available for month.. @airasia @airasia @airasia #senangbuatduit #viral #airasia #malaysiaviral
0
2
@airasia @foodpanda_my My 2 overseas flights in 2020 cancelled by AirAsia due to Covid. Already 3 years now and still no cash refunds despite numerous emails to AAX n MAVCOM with passengers forced to accept only useless travel vouchers without any choice or option. PAY OUR FULL CASH REFUNDS NOW!
0
3
@airasia Hi, i need to cancel a ticket and get a refund but i can't do it on you app or web page either, do you have a real person as a customer service instead of a chatbot please? An email address would also help. My flight will depart from Hong Kong. Thank You.
0
4
@GyeoulHp @TPolah05 Okay lang ang air asia for me!! (chsr singit) actually naka try ako ng airasia just this month bc my cebpac flight was cancelled for no apparent reason tapos i have to fly to mnl on that day dapat. Maganda rin siguro experience ko kasi first flight ako hehe
0
Applying the cleaning function to Tweet column
Removing duplicated Tweets
Mostly commonly the tweet that has been retweeted were removed