GPT-like Pre-Training on Unlabeled System Logs for Malware Detection

Join CeADAR on 18th January for their Tech Talk entitled “GPT-like Pre-Training on Unlabeled System Logs for Malware Detection” will be given by Dmitrijs Trizna, a senior security researcher of Microsoft.

In recent years, self-supervised language modeling techniques, such as those used in GPT-like language models, have shown great success in natural language processing tasks, without requiring supervision from domain experts to learn language semantics. In this talk, we explore the transferability of these techniques to system logs and share pre-training methodology of a Transformer model on unlabeled logs for malware detection.

Infrastructures generate vast amounts of system logs suitable for cybersecurity needs, but only a fraction of these logs are labeled and annotated for specific events or anomalies. Our experiments demonstrate that pre-training the model on unlabeled system logs leads to improved performance on the task of malware detection, compared to training on labeled data alone. Moreover, we show that the pre-trained model learns patterns that are similar to what a human engineer would consider relevant in detecting malware.

These findings highlight the potential of pre-training GPT-like models on system logs for cybersecurity applications, and demonstrate the benefits of self-supervised learning approaches in domains where labeled data is scarce. Overall, our work contributes to the growing body of literature on applying language modeling techniques beyond natural language processing and opens up new avenues for research in the field of cybersecurity.

Biography:

Dmitrijs is a Senior Security Researcher at Microsoft Corporation, and a Doctoral Researcher at SmartLab, University of Genova. He has ten years of experience in commercial cyber-security (both blue and red teaming), published research at industrial security conferences like BlackHat US and DefCon (AI Village), and scientific venues like CAMLIS, ACM CCS AISec. Dmitrijs received security certifications like OSCP, SANS (GREM, GDAT), CCNA Security, Standford Online, etc., and participated in cybersecurity training organized by NATO.

GPT-like Pre-Training on Unlabeled System Logs for Malware Detection

More

Subscribe to our Newsletter

Subscribe to our newsletter