š® Toxic behavior detection in gaming chats
This data application shows how LLM can automatically detect and classify toxic behavior in gaming chat messages, specifically focusing on DOTA 2 community interactions. The system analyzes chat patterns, identifies harassment and hate speech.
Analyzed 638 DOTA 2 chat messages (dataset from Huggingface)classified into three toxicity levels (0: Mid-toxic, 1: Non-toxic, 2: Toxic). Implemented both traditional NLP approaches and modern LLM-based detection using Google's Gemma model. Processing pipeline includes text preprocessing with spaCy, pattern analysis, and real-time toxicity classification.
Distribution of toxicity labels in DOTA 2 chat
Non-toxic messages dominate the dataset with 353 instances, while toxic content comprises 167 messages, showing that most player interactions remain positive.
Message length by toxicity level
Toxic messages tend to be shorter and more direct (median ~25 characters), while non-toxic messages show greater length variation, suggesting toxic players use brief, aggressive language.
Toxic message pattern clustering
The scatter plot matrix shows distinct clustering patterns where toxic messages correlate strongly with imperative commands, second-person pronouns, and profanity usage.
Message flow distribution
Message length follows a right-skewed distribution with most messages containing 2-8 words, indicating players prefer concise communication during gameplay.
Word cloud analysis
While the word cloud highlights frequent terms, individual word frequency alone is insufficient for toxicity detection - context and sentence structure analysis is required for accurate classification.
Why individual words don't determine toxicity:
Here are three examples showing how the same words can appear across different toxicity levels:
Non-toxic example:
Mid-toxic example:
Toxic example:
Toxicity detector with Gemma 3
Model details:
How to use the toxicity detector:
The application successfully demonstrates how LLMs can provide more nuanced toxicity detection compared to traditional keyword-based approaches, understanding context, sarcasm, and gaming-specific language patterns.