About the raw data

The raw CICIDS2017 data is a summarization of network traffic flows from a test network. The data can be used for training and comparing machine learning models.

Test network

The testbed simulates victim and attacker networks using both publicly available toolkits and custom tools (¹). The attacker network attempts multiple attacks against a backdrop of benign everyday traffic on the victim network. All traffic is captured for future processing.

Traffic flow summarization

Another tool extracts and summarizes information about the traffic captured in the test network (²). The summarized information appears as rows in comma-separated-values (CSV) files. Each row in a CSV file corresponds to a single traffic flow. It contains “features” of the flow, such as the duration of the flow, number of packets sent by the originator of the flow, number of packets sent by the flow destination, and so on. Each row also contains the “label” of the flow (³). The label represents the type of traffic - the specific attack that the attacker was attempting or the benign everyday traffic.

Data usage

The summarized flow information can be used to train machine learning models and compare such models’ performance. Such comparison is the primary usage of the summarized data in the CICIDS analysis project.

If such data were available in a real-life production network, it could be used to train machine learning models to detect attacks soon after the fact - before major damage happens.

It may be possible to absorb some of the machine learning models developed into a real-life intrusion detection/preventions system (IDS/IPS) for live attack detection.

Test network

Traffic flow summarization

Data usage

References