Grouping By One or Multiple Metrics
Selecting Data that Meets a Condition(s)
+Automated Condition Generation Based on List!
One Condition Example:
Getting the Dutch tweets only. 'nl' is the Dutch language code. We can use df.loc and then the condition.
Multiple Conditions Example:
In this case we want to get uncommon languages. We are defining uncommon languages as anything thats not English ('en') or Dutch ('nl'), as we saw previously in our groupings by language that these were the most common languages represented in the dataset.
Multiple Condition Tips
multiple conditions are tricky. heres some things that DONT work
df_uncommon=df.loc[df['lang']!='en' & df['lang']!='nl']
trying to group the conditions:
df_uncommon=df.loc[df['lang']!=('en' or 'nl')]
trying to have the conditions in a list:
df_uncommon=df.loc[df['lang'] not in removeLangs]
Automating Adding Conditions:
Adding conditions by hand can be a pain, especially if you have a lot of them. This is a method to add conditions automatically using a list. In this example we are removing all tweets in the dataset in languages English, Dutch, and Undetermined ('und').