The IDS Analysis Project

This blog and the associated Github repository discuss data science in information security. Much of the blog is about analyzing the CICIDS2017 traffic flow dataset. However, there are posts on related topics such as machine learning at scale and data science techniques in general.

Posts

Nov 30, 2020
Visualizing neural network metrics with TensorBoard
We finish our port of the neural networks model to Keras and TensorFlow by incorporating TensorBoard into the Colab notebook.
Nov 27, 2020
Data Science Operations at Scale
This is article 5 of a 5-part series on data science operations.
Nov 27, 2020
Model Monitoring
This is article 4 of a 5-part series on data science operations.
Nov 27, 2020
Model Deployment
This is article 3 of a 5-part series on data science operations.
Nov 27, 2020
Model Development and Maintenance
This is article 2 of a 5-part series on data science operations.
Nov 27, 2020
Infrastructure for Data Science
This is article 1 of a 5-part series on data science operations. The series was originally written in September 2019 but is being posted in November 2020.
Nov 26, 2020
Reimplementing the neural network classifier with Keras
We reimplement the neural classifier using Keras to develop a feel for the difference between Keras and PyTorch.
Nov 20, 2020
Anomaly detection with isolation forest
In this experiment, we use an isolation forest to detect heartbleed traffic flows.
Nov 17, 2020
Retrying principal component analysis and gaussian mixture models
We create a simpler dataset and use principal component analysis (PCA) and gaussian mixture models (GMMs) over this dataset.
Nov 16, 2020
Visualizing the principal components
We use principal component analysis (PCA) to extract components and attempt visualizing the data.
Nov 13, 2020
Exploring the data using gaussian mixture models
We use gaussian mixture models (GMMs) to improve our understanding of the attack class data.
Nov 12, 2020
Measuring classification performance
We use two measures for classification performance in this project: accuracy and F1-score.
Nov 11, 2020
Varying K nearest neighbors hyper-parameters
We vary the K-nearest-neighbors (KNN) hyper-parameters to understand KNN’s performance better.
Nov 10, 2020
Experimenting with K nearest neighbors
We attempt classification using K-nearest-neighbors (KNN) to increase the diversity of techniques used with the dataset.
Nov 9, 2020
Using neural networks
We attempt to beat the baseline classification accuracy of logistic regression with a neural network-based classifier.
Nov 9, 2020
Developing a baseline with logistic regression
We use logistic regression as the first classification technique on the processed data to develop a baseline for classification results.
Nov 8, 2020
On processed data
We process the raw CICIDS2017 data to get into a form that is usable by machine learning algorithms.
Nov 7, 2020
About the raw data
The raw CICIDS2017 data is a summarization of network traffic flows from a test network. The data can be used for training and comparing machine learning models.
Nov 6, 2020
Introduction to the IDS analysis project
The IDS analysis project seeks to analyze the CICIDS2017 dataset from the University of New Brunswick (UNB). The CICIDS2017 dataset contains information on network traffic flows. The traffic flows are tagged as benign or one of several attacks. The analysis project attempts to understand the characteristics of various techniques that separate benign traffic flows from attack traffic flows.