Tunisian Arabizi Tweet Sentiments
On social media, Arabic speakers tend to express themselves in their own local dialect. To do so, Tunisians use ‘Tunisian Arabizi’, where the Latin alphabet is supplemented with numbers. However, annotated datasets for Arabizi are limited; in fact, this challenge uses the only known Tunisian Arabizi dataset in existence.
Sentiment analysis relies on multiple word senses and cultural knowledge, and can be influenced by age, gender and socio-economic status.For this task, we have collected and annotated sentences from different social media platforms. The objective of this challenge is to, given a sentence, classify whether the sentence is of positive, negative, or neutral sentiment. For messages conveying both a positive and negative sentiment, whichever is the stronger sentiment should be chosen. Predict if the text would be considered positive, negative, or neutral (for an average user). This is a binary task.
Such solutions could be used by banking, insurance companies, or social media influencers to better understand and interpret a product’s audience and their reactions.
This notebook and dataset is part of Zindi Comeptition The part of code was taken from Social-Media-Sentiment-Analysis-for-Tunisian-Arabizi repo
Loading Data and Cleaning Text
EDA and Word Counts
Train and Test Split
Experimenting with Models
Metrics and Evalution Functions
Experimenting with Ensembling
Predicting and Submitting test dataset
This was a starter notebook to give an idea to students or competition participants on how to deal with the new language problem statement. You can train your model on a training database and predict sentiment from tweets on Test data sets.
I have improved my score by train my BERT large pertained model on TPU on 1 billion features.