-
SoK: Explainable Machine Learning for Computer Security Applications
Azqa Nadeem, Daniël Vos, Clinton Cao, Luca Pajola, Simon Dieck, Robert Baumgartner, Sicco Verwer
IEEE European Symposium on Security and Privacy (Euro S&P), 2023.Explainable Artificial Intelligence (XAI) is a promising solution to improve the transparency of machine learning (ML) pipelines. We systematize the increasingly growing (but fragmented) microcosm of studies that develop and utilize XAI methods for defensive and offensive cybersecurity tasks. We identify 3 cybersecurity stakeholders, i.e., model users, designers, and adversaries, that utilize XAI for 5 different objectives within an ML pipeline, namely 1) XAI-enabled decision support, 2) applied XAI for security tasks, 3) model verification via XAI, 4) explanation verification & robustness, and 5) offensive use of explanations. We further classify the literature w.r.t. the targeted security domain. Our analysis...
-
Learning state machines via efficient hashing of future traces
Robert Baumgartner, Sicco Verwer
Learning and Automata workshop (LearnAut), 2022.State machines are popular models to model and visualize discrete systems such as software systems, and to represent regular grammars. Most algorithms that passively learn state machines from data assume all the data to be available from the beginning and they load this data into memory. This makes it hard to apply them to continuously streaming data and results in large memory requirements when dealing with large datasets. In this paper we propose a method to learn state machines from data streams using the count-min-sketch data structure to reduce memory requirements. We apply state merging using the well-known red-blue-framework to...
-
Learning State Machines to Monitor and Detect Anomalies on a Kubernetes Cluster
Clinton Cao, Sicco Verwer, Agathe Blaise, Filippo Rebecchi
International Workshop on Continuous Software Evaluation and Certification (IWCSEC) @ ARES, 2022.These days more companies are shifting towards using cloud environments to provide their services to their client. While it is easy to set up a cloud environment, it is equally important to monitor the system’s runtime behaviour and identify anomalous behaviours that occur during its operation. In recent years, the utilisation of Recurrent Neural Networks (RNNs) and Deep Neural Networks (DNNs) to detect anomalies that might occur during runtime has been a trending approach. However, it is unclear how to explain the decisions made by these networks and how these networks should be interpreted to understand the runtime behaviour that...
-
Robust Attack Graph Generation
Dennis Mouwen, Sicco Verwer, Azqa Nadeem
Learning and Automata workshop (LearnAut), 2022.We present a method to learn automaton models that are more robust to input modifications. It iteratively aligns sequences to a learned model, modifies the sequences to their aligned versions, and re-learns the model. Automaton learning algorithms are typically very good at modeling the frequent behavior of a software system. Our solution can be used to also learn the behavior present in infrequent sequences, as these will be aligned to the frequent ones represented by the model. We apply our method to the SAGE tool for modeling attacker behavior from intrusion alerts. In experiments, we demonstrate that our algorithm learns...
-
Adversarially Robust Decision Tree Relabeling
Daniël Vos, Sicco Verwer
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 2022.Decision trees are popular models for their interpretation properties and their success in ensemble models for structured data. However, common decision tree learning algorithms produce models that suffer from adversarial examples. Recent work on robust decision tree learning mitigates this issue by taking adversarial perturbations into account during training. While these methods generate robust shallow trees, their relative quality reduces when training deeper trees due the methods being greedy. In this work we propose robust relabeling, a post-learning procedure that optimally changes the prediction labels of decision tree leaves to maximize adversarial robustness. We show this can be achieved in...
-
Robust Optimal Classification Trees against Adversarial Examples
Daniël Vos, Sicco Verwer
AAAI Conference on Artificial Intelligence, 2022.Decision trees are a popular choice of explainable model, but just like neural networks, they suffer from adversarial examples. Existing algorithms for fitting decision trees robust against adversarial examples are greedy heuristics and lack approximation guarantees. In this paper we propose ROCT, a collection of methods to train decision trees that are optimally robust against user-specified attack models. We show that the min-max optimization problem that arises in adversarial learning can be solved using a single minimization formulation for decision trees with 0-1 loss. We propose such formulations in Mixed-Integer Linear Programming and Maximum Satisfiability, which widely available solvers can...
-
SECLEDS: Sequence Clustering in Evolving Data Streams via Multiple Medoids and Medoid Voting
Azqa Nadeem, Sicco Verwer
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 2022.Sequence clustering in a streaming environment is challenging because it is computationally expensive, and the sequences may evolve over time. K-medoids or Partitioning Around Medoids (PAM) is commonly used to cluster sequences since it supports alignment-based distances, and the k-centers being actual data items helps with cluster interpretability. However, offline k-medoids has no support for concept drift, while also being prohibitively expensive for clustering data streams. We therefore propose SECLEDS, a streaming variant of the k-medoids algorithm with constant memory footprint. SECLEDS has two unique properties: i) it uses multiple medoids per cluster, producing stable highquality clusters, and ii) it...
-
Efficient training of robust decision trees against adversarial examples
Daniël Vos, Sicco Verwer
International Conference on Machine Learning (ICML), 2021.Current state-of-the-art algorithms for training robust decision trees have high runtime costs and require hours to run. We present GROOT, an efficient algorithm for training robust decision trees and random forests that runs in a matter of seconds to minutes. Where before the worst-case Gini impurity was computed iteratively, we find that we can solve this function analytically to improve time complexity from O (n) to O (1) in terms of n samples. Our results on both single trees and ensembles on 14 structured datasets as well as on MNIST and Fashion-MNIST demonstrate that GROOT runs several orders of magnitude...
-
SAGE Intrusion Alert-driven Attack Graph Extractor
Azqa Nadeem, Sicco Verwer, Shanchieh Jay Yang
Symposium on Visualization for Cyber Security (VizSec), 2021.Attack graphs (AG) are used to assess pathways availed by cyber adversaries to penetrate a network. State-of-the-art approaches for AG generation focus mostly on deriving dependencies between system vulnerabilities based on network scans and expert knowledge. In real-world operations however, it is costly and ineffective to rely on constant vulnerability scanning and expert-crafted AGs. We propose to automatically learn AGs based on actions observed through intrusion alerts, without prior expert knowledge. Specifically, we develop an unsupervised sequence learning system, SAGE, that leverages the temporal and probabilistic dependence between alerts in a suffix-based probabilistic deterministic finite automaton(S-PDFA) – a model that...
-
Enabling Visual Analytics via Alert-driven Attack Graphs
Azqa Nadeem, Sicco Verwer, Stephen Moskal, Shanchieh Jay Yang
ACM SIGSAC Conference on Computer and Communications Security (CCS), 2021.Attack graphs (AG) are a popular area of research that display all the paths an attacker can exploit to penetrate a network. Existing techniques for AG generation rely heavily on expert input regarding vulnerabilities and network topology. In this work, we advocate the use of AGs that are built directly using the actions observed through intrusion alerts, without prior expert input.We have developed an unsupervised visual analytics system, called SAGE, to learn alert-driven attack graphs.We show how these AGs (i) enable forensic analysis of prior attacks, and (ii) enable proactive defense by providing relevant threat intelligence regarding attacker strategies. We...
-
Alert-driven Attack Graph Generation using S-PDFA
Azqa Nadeem, Sicco Verwer, Stephen Moskal, Shanchieh Jay Yang
IEEE Transcations on Dependable and Secure Computing, 2021.Ideal cyber threat intelligence (CTI) includes insights into attacker strategies that are specific to a network under observation. Such CTI currently requires extensive expert input for obtaining, assessing, and correlating system vulnerabilities into a graphical representation, often referred to as an attack graph (AG). Instead of deriving AGs based on system vulnerabilities, this work advocates the direct use of intrusion alerts. We propose SAGE, an explainable sequence learning pipeline that automatically constructs AGs from intrusion alerts without a priori expert knowledge. SAGE exploits the temporal and probabilistic dependence between alerts in a suffix-based probabilistic deterministic finite automaton (S-PDFA)-a model that...
-
Black-box combinatorial optimization using models with integer-valued minima
Laurens Bliek, Sicco Verwer, Mathijs de Weerdt
Annals of Mathematics and Artificial Intelligence, 2020.When a black-box optimization objective can only be evaluated with costly or noisy measurements, most standard optimization algorithms are unsuited to find the optimal solution. Specialized algorithms that deal with exactly this situation make use of surrogate models. These models are usually continuous and smooth, which is beneficial for continuous optimization problems, but not necessarily for combinatorial problems. However, by choosing the basis functions of the surrogate model in a certain way, we show that it can be guaranteed that the optimal solution of the surrogate model is integer. This approach outperforms random search, simulated annealing and a Bayesian optimization...
-
The Robust Malware Detection Challenge and Greedy Random Accelerated Multi-Bit Search
Sicco Verwer, Azqa Nadeem, Christian Hammerschmidt, Laurens Bliek, Abdullah Al-Dujaili, Una-May O'Reilly
Artificial Intelligence and Security (AISec) @ CCS, 2020.Training classifiers that are robust against adversarially modified examples is becoming increasingly important in practice. In the field of malware detection, adversaries modify malicious binary files to seem benign while preserving their malicious behavior. We report on the results of a recently held robust malware detection challenge. There were two tracks in which teams could participate: the attack track asked for adversarially modified malware samples and the defense track asked for trained neural network classifiers that are robust to such modifications. The teams were unaware of the attacks/defenses they had to detect/evade. Although only 9 teams participated, this unique setting...
-
Beyond Labeling Using Clustering to Build Network Behavioral Profiles of Malware Families
Azqa Nadeem, Christian Hammerschmidt, Carlos H. Ganan, Sicco Verwer
Malware Analysis using Artificial Intelligence and Deep Learning, Springer, 2020.Malware family labels are known to be inconsistent. They are also black box since they do not represent the capabilities of malware. The current state-of the-art in malware capability assessment include mostly manual approaches, which are infeasible due to the ever-increasing volume of discovered malware samples. We propose a novel unsupervised machine learning-based method called MalPaCA, which automates capability assessment by clustering the temporal behavior in malware’s network traces. MalPaCA provides meaningful behavioral clusters using only 20 packet headers. Behavioral profiles are generated based on the cluster membership of malware’s network traces. A Directed Acyclic Graph shows the relationship between...
-
Hybrid Connection and Host Clustering for Community Detection in Spatial-temporal Network Data
Mark Patrick Roeling, Azqa Nadeem, Sicco Verwer
Machine Learning for Cybersecurity (MLCS) @ ECML-PKDD, 2020.Network data clustering and sequential data mining are large fields of research, but how to combine them to analyze spatial-temporal network data remains a technical challenge. This study investigates a novel combination of two sequential similarity methods (Dynamic Time Warping and N-grams with Cosine distances), with two state-of-the-art unsupervised network clustering algorithms (Hierarchical Density-based Clustering and Stochastic Block Models). A popular way to combine such methods is to first cluster the sequential network data, resulting in connection types. The hosts in the network can then be clustered conditioned on these types. In contrast, our approach clusters nodes and edges in...
-
Learning Optimal Classification Trees Using a Binary Linear Program Formulation
Sicco Verwer, Yingqian Zhang
Proceedings of the AAAI Conference on Artificial Intelligence, 2019.We provide a new formulation for the problem of learning the optimal classification tree of a given depth as a binary linear program. A limitation of previously proposed Mathematical Optimization formulations is that they create constraints and variables for every row in the training data. As a result, the running time of the existing Integer Linear programming (ILP) formulations increases dramatically with the size of data. In our new binary formulation, we aim to circumvent this problem by making the formulation size largely independent from the training data size. We show experimentally that our formulation achieves better performance than existing...
-
Solving bin-packing problems under privacy preservation: Possibilities and trade-offs
Rowan Hoogervorst, Yingqian Zhang, Gamze Tillem, Zekeriya Erkin, Sicco Verwer
Information Sciences, 2019.We investigate the trade-off between privacy and solution quality that occurs when a k-anonymized database is used as input to the bin-packing optimization problem. To investigate the impact of the chosen anonymization method on this trade-off, we consider two recoding methods for k-anonymity: full-domain generalization and partition-based single-dimensional recoding. To deal with the uncertainty created by anonymization in the bin-packing problem, we utilize stochastic programming and robust optimization methods. Our computational results show that the trade-off is strongly dependent on both the anonymization and optimization method. On the anonymization side, we see that using single dimensional recoding leads to significantly...
-
Using Datasets from Industrial Control Systems for Cyber Security Research and Education
Qin Lin, Sicco Verwer, Robert Kooij, Aditya Mathur
International Conference on Critical Information Infrastructures Security, 2019.The availability of high-quality benchmark datasets is an important prerequisite for research and education in the cyber security domain. Datasets from realistic systems offer a platform for researchers to develop and test novel models and algorithms. Such datasets also offer students opportunities for active and project-centric learning. In this paper, we describe six publicly available datasets from the domain of Industrial Control Systems (ICS). Five of these datasets are obtained through experiments conducted in the context of operational ICS while the sixth is obtained from a widely used simulation tool, namely EPANET, for large scale water distribution networks. This paper...
-
Learning fuzzy decision trees using integer programming
Jason Rhuggenaath, Yingqian Zhang, Alp Akcay, Uzay Kaymak, Sicco Verwer
IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2018.A popular method in machine learning for supervised classification is a decision tree. In this work we propose a new framework to learn fuzzy decision trees using mathematical programming. More specifically, we encode the problem of constructing fuzzy decision trees using a Mixed Integer Linear Programming (MIP) model, which can be solved by any optimization solver. We compare the performance of our method with the performance of off-the-shelf decision tree algorithm CART and Fuzzy Inference Systems (FIS) using benchmark data-sets. Our initial results are promising and show the advantages of using non-crisp boundaries for improving classification accuracy on testing data.
...