数据集官网Discover datasets around the world!https://archive.ics.uci.edu/dataset/942/rt-iot2022RT-IoT2022 是源自实时物联网基础设施的专有数据集,作为集成了各种物联网设备和复杂网络攻击方法的综合资源而引入。该数据集包含正常和对抗性网络行为,提供了现实世界场景的一般表示。 RT-IoT2022 结合了来自 ThingSpeak-LED、Wipro-Bulb 和 MQTT-Temp 等物联网设备的数据,以及涉及暴力 SSH 攻击、使用 Hping 和 Slowloris 的 DDoS 攻击以及 Nmap 模式的模拟攻击场景,提供了详细的洞察网络流量的复杂性。使用 Zeek 网络监控工具和 Flowmeter 插件精心捕获网络流量的双向属性。研究人员可以利用 RT-IoT2022 数据集来提高入侵检测系统 (IDS) 的功能,促进实时物联网网络的稳健和自适应安全解决方案的开发。
数据集出处论文:
[PDF] Quantized autoencoder (QAE) intrusion detection system for anomaly detection in resource-constrained IoT devices using RT-IoT2022 dataset | Semantic ScholarThis study proposes quantized autoencoder (QAE) model for intrusion detection systems to detect anomalies and shows that QAE-u8 outperforms all other models with a reduction of 70.01% in average memory utilization, 92.23% in memory size compression, and 27.94% in peak CPU utilization. In recent years, many researchers focused on unsupervised learning for network anomaly detection in edge devices to identify attacks. The deployment of the unsupervised autoencoder model is computationally expensive in resource-constrained edge devices. This study proposes quantized autoencoder (QAE) model for intrusion detection systems to detect anomalies. QAE is an optimization model derived from autoencoders that incorporate pruning, clustering, and integer quantization techniques. Quantized autoencoder uint8 (QAE-u8) and quantized autoencoder float16 (QAE-f16) are two variants of QAE built to deploy computationally expensive AI models into Edge devices. First, we have generated a Real-Time Internet of Things 2022 dataset for normal and attack traffic. The autoencoder model operates on normal traffic during the training phase. The same model is then used to reconstruct anomaly traffic under the assumption that the reconstruction error (RE) of the anomaly will be high, which helps to identify the attacks. Furthermore, we study the performance of the autoencoders, QAE-u8, and QAE-f16 using accuracy, precision, recall, and F1 score through an extensive experimental study. We showed that QAE-u8 outperforms all other models with a reduction of 70.01% in average memory utilization, 92.23% in memory size compression, and 27.94% in peak CPU utilization. Thus, the proposed QAE-u8 model is more suitable for deployment on resource-constrained IoT edge devices.https://www.semanticscholar.org/paper/753f6ede01b4acaa325e302c38f1e0c1ade74f5b特征及标签
'id.orig_p', 'id.resp_p', 'proto', 'service', 'flow_duration','fwd_pkts_tot', 'bwd_pkts_tot', 'fwd_data_pkts_tot','bwd_data_pkts_tot', 'fwd_pkts_per_sec', 'bwd_pkts_per_sec','flow_pkts_per_sec', 'down_up_ratio', 'fwd_header_size_tot','fwd_header_size_min', 'fwd_header_size_max', 'bwd_header_size_tot','bwd_header_size_min', 'bwd_header_size_max', 'flow_FIN_flag_count','flow_SYN_flag_count', 'flow_RST_flag_count', 'fwd_PSH_flag_count','bwd_PSH_flag_count', 'flow_ACK_flag_count', 'fwd_URG_flag_count','bwd_URG_flag_count', 'flow_CWR_flag_count', 'flow_ECE_flag_count','fwd_pkts_payload.min', 'fwd_pkts_payload.max', 'fwd_pkts_payload.tot','fwd_pkts_payload.avg', 'fwd_pkts_payload.std', 'bwd_pkts_payload.min','bwd_pkts_payload.max', 'bwd_pkts_payload.tot', 'bwd_pkts_payload.avg','bwd_pkts_payload.std', 'flow_pkts_payload.min','flow_pkts_payload.max', 'flow_pkts_payload.tot','flow_pkts_payload.avg', 'flow_pkts_payload.std', 'fwd_iat.min','fwd_iat.max', 'fwd_iat.tot', 'fwd_iat.avg', 'fwd_iat.std','bwd_iat.min', 'bwd_iat.max', 'bwd_iat.tot', 'bwd_iat.avg','bwd_iat.std', 'flow_iat.min', 'flow_iat.max', 'flow_iat.tot','flow_iat.avg', 'flow_iat.std', 'payload_bytes_per_second','fwd_subflow_pkts', 'bwd_subflow_pkts', 'fwd_subflow_bytes','bwd_subflow_bytes', 'fwd_bulk_bytes', 'bwd_bulk_bytes','fwd_bulk_packets', 'bwd_bulk_packets', 'fwd_bulk_rate','bwd_bulk_rate', 'active.min', 'active.max', 'active.tot', 'active.avg','active.std', 'idle.min', 'idle.max', 'idle.tot', 'idle.avg','idle.std', 'fwd_init_window_size', 'bwd_init_window_size','fwd_last_window_size', 'Attack_type'
各种攻击类型的数据量
可以看到后面几种攻击几乎没有多少数据,导致的结果就是这几类识别的准确率特别低,大多数都识别成了DOS_SYN_Hping或者ARP_poisioning,如Metasploit_Brute_Force_SSH识别成DOS_SYN_Hping.
为了提高准确率,需要获取更多的训练数据.根据论文,找到了数据生成方法
即使用 CICFlowmeter 工具将从 Wireshark 收集的 PCAP 文件转换并转储为 CSV 文件.
但这里的CICFlowmeter并非是https://github.com/ahlashkari/CICFlowMeter
CICFlowmeter的官方版本没有提供那么多特征.
而是zeek的重置版.
论文真正用到的工具是nullhttps://github.com/zeek-flowmeter/zeek-flowmeter/完美匹配到了83个特征,并且连特征名字都是一样的.
Zeek FlowMeter安装方法
请使用linux系统
安装zeek
echo 'deb http://download.opensuse.org/repositories/security:/zeek/xUbuntu_22.04/ /' | sudo tee /etc/apt/sources.list.d/security:zeek.list
curl -fsSL https://download.opensuse.org/repositories/security:zeek/xUbuntu_22.04/Release.key | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/security_zeek.gpg > /dev/null
sudo apt update
sudo apt install zeek-6.0
zeek默认安装位置是/opt/zeek
需要手动添加环境变量
安装Zeek FlowMeter模块
先安装zkg,直接输入命令zkg会提示缺少一些python的包,根据提示安装即可.
把Zeek FlowMeter的代码库拉取到本地,并安装
git clone https://github.com/zeek-flowmeter/zeek-flowmeter.git
cd zeek-flowmeter
zkg install .
将Zeek FlowMeter添加到本地 zeek 配置(可选)
要将 FlowMeter 添加到 zeek 的标准本地配置中,请编辑<zeekscriptdir>/site/local.zeek
并添加
@load flowmeter
流量监控分析
使用wireshark监控流量并导出pcap文件.
zeek flowmeter -r your.pcap
执行完命令就会在当前文件夹下生成log文件
最后处理log文件就可以生成和数据集相同格式的数据