Projects
Explainability in Graph Neural Networks for Internet Traffic Classification
2023-06-01 ~ 2025-02-28
We addressed the challenges of traditional traffic classification methods, which are hindered by encrypted traffic, dynamic ports, and the increasing diversity of applications, as well as the tendency of existing deep learning models to overfit to irrelevant features. To overcome these limitations, we applied Graph Neural Networks (GNNs) that can effectively learn the complex structural patterns of network traffic and operate without direct payload inspection, and leveraged graph-based Explainable AI (XAI) techniques to improve accuracy and enhance interpretability. Using diverse traffic data, we constructed BLINC-based host behavior graphs to model the communication activities of individual hosts, and Graption-based network-wide behavior graphs to capture overall communication patterns across the network. We trained and compared four representative GNN algorithms—GCN, GAT, GraphSAGE, and GIN—on these graph structures, and applied explainable GNN techniques such as GNNExplainer, PGExplainer, and GraphMaskExplainer to analyze the reasoning behind classification decisions. Our approach achieved 95–99% accuracy for host behavior graphs, demonstrating State-of-the-Art performance. Explainable GNN analysis revealed distinctive structural patterns for major applications and uncovered previously uncharacterized patterns, including those in NTP and NetBIOS traffic.
Research on Virtualization-based 5G Networks and Cyber Threats
2023-04-01 ~ 2023-10-31
We developed Cyber5Gym, a virtualization-based cyber training and testing environment for 5G mobile networks, to address various real-world cyber threats. This work extends our 2022 study, Research on Building a Virtualization-Based 5G Cyber Training Environment, by expanding the training environment, incorporating additional threat scenarios, and enhancing system scalability. We analyzed the architecture, protocols, and network functions of 5G systems, and investigated domestic and international 5G-related cyber threat trends. We identified major vulnerabilities across different segments of the 5G network and selected three representative scenarios—high-bandwidth traffic generation attacks, NAS Security Mode Command (SMC) replay attacks, and Distributed Denial of Service (DDoS) attacks—for implementation in Cyber5Gym. Cyber5Gym adopts a multi-agent architecture consisting of a master server (administrator) and multiple clients (trainees). It is deployed on a cloud platform using the open-source software Open5GS (core network) and UERANSIM (virtual UE and RAN simulator), with automated deployment and configuration to ensure reproducibility and scalability.
Deep Learning based Internet Traffic Classification - Myths, Realities, and their Explainabilities
2022-06-01 ~ 2023-05-31
We address the lack of explainability in deep learning-based Internet traffic classification by applying a Hierarchical Attention Network (HAN) to analyze the features driving classification decisions. HAN processes traffic data in a hierarchical structure—Flow → Packet → Header + Payload—and employs an attention mechanism to assign higher weights to more important elements at each level. This approach enables the automatic identification of key features within packet headers and payloads that most influence classification outcomes. We conducted comparative analyses using multiple types of benign and malicious traffic datasets with various deep learning algorithms to evaluate and validate this approach. Our HAN-based analysis showed that high-attention features include both biased environment-dependent values (e.g., TTL, TCP Flags) and well-known key features (e.g., port number, packet size, protocol). This approach reduces overfitting risks, guides feature selection, and improves the reliability of deep learning-based traffic classification, with potential applications in threat detection, dataset assessment, and cybersecurity training.
Research on Building a Virtualization-Based 5G Cyber Training Environment
2022-04-01 ~ 2022-10-31
We built a virtualization-based cyber training environment for 5G mobile networks to address various cyber threats. We analyzed the architecture, protocols, and network functions of 5G systems, and investigated domestic and international 5G-related cyber threat trends. We identified 14 major vulnerabilities and examined their attack procedures, impacts, and countermeasures. We implemented the training environment using the open-source software Open5GS and UERANSIM, deployed on NAVER Cloud to realize a functional 5G core network, UE, and RAN setup. We integrated an Android Emulator-based UE environment, external network connectivity, and automated deployment/configuration capabilities to create an efficient and reusable training platform. This environment reproduces known 5G vulnerabilities and enables practical testing of response strategies. It can be applied to 5G security research, security equipment validation, penetration testing, and hands-on cybersecurity training.
Multi-modal data-driven Explainable AI Systems and the Future of Digital Finance
2019-09-01 ~ 2022-02-28
To detect fraudulent or deceptive campaigns in digital finance platforms (crowdfunding, microfinance, ICO), we collected a large-scale dataset from Kickstarter, including both fraudulent and legitimate campaigns, and developed a multi-modal explainable AI (XAI) model leveraging text, video, audio, and non-content information. For text analysis, we applied a Hierarchical Attention Network (HAN) with attention weight visualization at the sentence and word levels to provide explanatory evidence. Video data was analyzed using a 3D-CNN (e.g., InceptionV3) to capture nonverbal cues such as facial expressions, gaze, and gestures, while audio data was processed using MFCC and openSMILE to extract acoustic features for pattern analysis. We also examined metadata (e.g., account creation time, social network links) and behavioral history (e.g., prior investments, past campaign records) to help identify potential fraud. Finally, all of these multi-modal data sources were integrated to build a comprehensive fraud detection model.
Towards Explainable AI in Next-Generation Intrusion Detection Systems
2019-04-01 ~ 2019-10-31
We aim to solve the "explainability" problem, which makes it difficult to explain the reasons for the excellent performance of deep learning-based network traffic classification and threat detection technologies. We evaluated the performance of various deep learning algorithms for traffic classification and, in particular, proposed a method of applying the Hierarchical Attention Network (HAN), a natural language processing technique, to network traffic classification. Furthermore, through attention visualization analysis, an explainable AI (XAI) technique, we identified which elements the deep learning model focuses on during traffic classification. The HAN model was adapted for this purpose by leveraging the similarity between the hierarchical structure of traffic data (flow-packet-byte) and the structure of natural language (document-sentence-word).
Statistics-based Network Behavior Modeling
2018-05-01 ~ 2018-10-31
We focused on statistical-based network behavior modeling to develop technology that efficiently classifies network traffic and detects unusual signs that deviate from normal patterns. To overcome the limitations of traditional port-based and payload-based traffic classification methods, we used host behavior analysis and Latent Dirichlet Allocation (LDA) techniques to identify traffic characteristics and patterns, which increased our classification accuracy. For user convenience, we developed a GUI-based traffic classification tool and added an x.509 certificate analysis feature to extract certificate information from SSL communication traffic. Using real-world laboratory data and public datasets, we analyzed various network behaviors, including server, client, and attack traffic. We then visually verified the anomalies using BLINC graphs and Radar charts.
Traffic Measurement in Anonymity Networks
2017-04-01 ~ 2017-10-31
We studied methods for collecting network traffic from anonymity networks, specifically the Tor network. It researched how to collect traffic, analyzed the collected data, and reviewed existing attacks on anonymity networks. The study focused on setting up a Tor Exit Node to collect unencrypted traffic, including the full packet payload. Data was also collected from a client's perspective, which confirmed that Tor circuits typically consist of relays from different countries and are frequently re-established to maintain user anonymity.
Characterization and Automatic Labeling of Malicious Traffic in Control System Networks
2017-04-01 ~ 2017-10-31
We proposes an automated method for classifying specialized network traffic in Industrial Control Systems (ICS), also known as SCADA. To address the limitations of existing traffic classification tools, which struggle to identify the unique traffic patterns in these critical systems, we leveraged Latent Dirichlet Allocation (LDA), a probabilistic text modeling technique. By treating a network traffic flow as a document and its payload data as words, the LDA model automatically extracts hidden "topics" (traffic signatures) to classify the flows. Applied to real-world water resource control system traffic data (approx. 44 GB), our method successfully classified 96.3% of the traffic that existing tools failed to identify, demonstrating its effectiveness and applicability in specialized SCADA environments.
Network Traffic Classification for Intrusion Detection
2015-06-01 ~ 2015-12-31
We aim to build a system for identifying threatening traffic by classifying network traffic. It proposes an automated signature detection method based on Latent Dirichlet Allocation (LDA) to solve problems with existing application traffic analysis. The system can automatically analyze traffic content without prior knowledge or signatures, allowing it to detect and classify traffic signatures from new applications.