Interactive tool for investigating anomalous authentication and access events across 1,260 login/access records with 86 confirmed malicious activities. Features threshold tuning with precision-recall tradeoff analysis, allowlist management, and temporal/geographic distribution views. At threshold 0.61, the system achieves 89% precision and 81% recall, generating 79 alerts. Allowlisting reduces volume by 13.9% while preserving 91.4% of true positives.
Use template -> Event patterns and distribution
The dataset contains 1,260 authentication and access events spanning multiple geographies and time periods, with 86 labeled as malicious based on ground truth.
Temporal distribution shows concentration during business hours. The hourly event histogram peaks between hours 8-16, with notably lower activity during early morning (0-5) and late evening (20-24). After-hours spikes warrant investigation as they may indicate automated attacks, compromised credentials being tested outside normal usage windows, or legitimate activity from users in different time zones that needs baseline adjustments.
Geographic concentration follows expected patterns with outliers. The United States dominates event volume at 250+ events, followed by Canada (~100) and Australia (~90). India, France, and Singapore show moderate activity (70-90 events each). The presence of events from countries like Turkey, South Korea, and the UAE—while lower in absolute volume, may be worth examining if these don't align with the organization's expected user base or business footprint.
Login and access events are roughly balanced. Both event types appear throughout the 24-hour cycle with similar distributions, suggesting the anomaly detection logic applies to both authentication attempts and subsequent resource access patterns. This dual coverage helps catch both initial compromise attempts and lateral movement or data exfiltration activities.
Threshold tuning and alert optimization
The tool provides threshold adjustment capabilities to balance detection sensitivity against analyst workload.
The precision-recall tradeoff is visible across thresholds. At low thresholds (~0.4), recall approaches 1.0 but precision drops below 0.6, generating ~280 alerts with high false positive rates. As threshold increases, precision rises while recall decreases. The F1 score peaks around 0.6-0.65, representing the optimal balance point for many teams. The chosen threshold of 0.61 sits near this maximum.
At threshold 0.61, baseline metrics are strong. The system generates 79 alerts with 89% precision (70 of 79 are true positives) and 81% recall (70 of 86 malicious events caught). F1 score of 0.85 indicates solid overall performance. This configuration means the team investigates roughly 1 false positive for every 8 true incidents, while missing about 19% of malicious activity.
Allowlisting provides targeted volume reduction. After applying an allowlist (likely for known-good IPs, device fingerprints, or user-behavior patterns), alert count drops from 79 to 68—a 13.9% reduction. Precision improves to 94% as false positives are filtered, though recall dips slightly to 74%. Critically, 91.4% of the original true positives remain, meaning only 6 of 70 real incidents are inadvertently suppressed by the allowlist.
Alert volume decreases exponentially with threshold. The volume curve shows that small threshold adjustments at the lower end (0.4-0.5) dramatically reduce alerts from 280 to ~150, while adjustments at the higher end (0.7-0.8) have diminishing returns. Teams facing alert fatigue can make substantial volume cuts with modest recall sacrifice by moving from very low thresholds toward the 0.55-0.65 range.
This tuning interface allows security teams to adapt detection sensitivity to their investigation capacity, risk tolerance, and the specific threat landscape they face.