VoBERT: Unstable Log Sequence Anomaly Detection: Introducing Vocabulary-free Bert (June 20th)
Thursday, June 20th, 2024: 2:00 - 3:00 PM
Security Operations Centres (SOC) are overwhelmed by false positives due to the rapid growth in data volumes and the inability of current analytics models to adapt to evolutionary changes in logs, i.e., unstable log data, creating a need for more efficient solutions. Thus, we introduce VoBERT, an innovative sequence anomaly detection method.
An improvement on BERTs (Bidirectional Encoder Representations from Transformers), VoBERT adds resilience by accurately classifying unstable logs that traditional BERT-like models would deem out-of-vocabulary.
We show that a standard BERT and a simple heuristic (defined as the anomaly score of a sequence is the percentage of unseen logs) often used in industry cannot deal with log changes in time. This innovation is crucial as a more stable model leads to a significant reduction in the number of false positives and enhances our attack detection.
Our evaluation for the Thunderbird log dataset shows the MCC (Matthews correlation coefficient) of the standard BERT model and the heuristic decreasing significantly from 60% (no unseen logs) to 10% (for 97% unseen logs). Meanwhile, VoBERT experienced no significant decay (-2%), showing on-par performance under realistic instabilities. We also tested VoBERT against real-world data from a large European bank (50,000+ employees).
Hosted by blackhat